Chance may affect the results of a study if too few outcomes have been observed to yield reliable estimates of treatment effects. Small studies in which few outcome events occur are usually not informative and the results are sometimes seriously misleading.
To assess the role that chance may have played in the results of fair tests, researchers use ‘tests of statistical significance’. When statisticians and others refer to ‘significant differences’ between treatments, they are usually referring to statistical significance. Statistically significant differences between treatments are not necessarily of any practical importance. But tests of statistical significance are important nevertheless because they help us to avoid mistaken conclusions that real differences in treatments exist when they don’t – sometimes referred to as Type I errors. It is also important to take account of a sufficiently large number of outcomes of treatment to avoid a far more common danger – concluding that there are no differences between treatments when in fact there are. These mistakes are sometimes referred to as Type II errors.
Awareness of the importance of taking account of the play of chance began during the 19th century (list relevant records). Thomas Graham Balfour, for example, interpreted the results of his test of claims that belladonna could prevent the orphans under his care developing scarlet fever (Balfour 1854). Two out of 76 boys allocated to receive belladonna developed scarlet fever compared with 2 out of 75 boys who did not receive the drug. Balfour noted that “the numbers are too small to justify deductions as to the prophylactic power of belladonna”. If more of the boys had developed scarlet fever, Balfour might have been able to reach a more confident conclusion about the possible effects of belladonna. Instead, he simply noted that 4 cases of scarlet fever among 151 boys was too small a number to reach a confident conclusion.
One approach that reduces the likelihood that we will be misled by chance effects involves estimating a range of treatment differences within which the real differences are likely to lie (Gavarret 1840; Huth 2006). These range estimates are known as confidence intervals. Repeating a treatment comparison is likely to yield varying estimates of the differential effects of treatments on outcomes, particularly if the estimates are based on small numbers of outcomes. Confidence intervals take account of this variation, and so they are more informative than mere tests of statistical significance, and thus more helpful in reducing the likelihood that we will be misled by the play of chance.
Statistical tests and confidence intervals – whether for analysis of individual studies, or in meta-analysis of a number of separate but similar studies – help us to take account of the play of chance and avoid concluding that treatment effects and differences exist when they don’t, and don’t exist when they do.
The text in these essays may be copied and used for non-commercial purposes on condition that explicit acknowledgement is made to The James Lind Library (www.jameslindlibrary.org).
Balfour TG (1854). Quoted in West C. Lectures on the Diseases of Infancy and Childhood. London, Longman, Brown, Green and Longmans, p 600.
Gavarret LDJ (1840). Principes gènèraux de statistique mèdicale: ou dèveloppement des règles qui doivent prèsider á son emploi. Paris: Bechet jeune & Labè.
Huth EJ (2006). Jules Gavarret’s Principes Généraux de Statistique Médicale: a pioneering text on the statistical analysis of the results of treatments.