There is a view among some medical historians that the emergence of the randomized clinical trial originated from statistical thinking, and that the modern era of controlled trials was essentially ushered in with the iconic randomised trial of streptomycin for pulmonary tuberculosis reported by the British Medical Research Council in 1948. For example historians have written that:
The professional emergence of statistics as a codified body of knowledge and the concomitant rise of individuals trained in its methods provided the necessary conditions for the Laplacian vision of the probabilistically based clinical trial to come into being (Rosser Matthews 1995, p 127).
The randomized clinical trial is “an extension of the statistician RA Fisher’s ideas about experimental design”. “The statisticians’ randomized controlled trial came to represent the symbol and substance of the statistical method in medicine” (Marks 1997, p 138).
The history of randomized clinical trials may be traced back to the biometricians’ work and it seems to be a good example of ‘applied statistics’. On the one hand there was a direct lineage from Pearson to Bradford Hill via Fisher and Major Greenwood…. On the other hand, it is not too difficult to argue for conceptual legacy, since the basic concepts grounding the choice of randomisation can be traced back to RA Fisher’s work. (Gaudillière 2001, p 283).
[Karl] Pearson’s statistical methods provided the framework for Austin Bradford Hill’s work on the randomised clinical trial (p viii-ix) and constituted a seminal statistical idea (Magnello 2002, p 107).
The conceptualisation of clinical trials as “a seminal statistical idea” which “can be traced back to RA Fisher’s work” has not been demonstrated by these writers or by others. The early history of clinical trials has little to do with statistical theory and much more to do with the more fundamental and less technical concept of a fair – that is, unbiased – test (Chalmers 1997; 1999; 2001; Edwards 2004 ; 2005; Edwards 2006; Chalmers 2009).
The need to ‘compare like with like’ in fair tests of treatments has been recognised by some people for a long time. In a letter to Boccacio written in 1364, Petrarch wrote:
I once heard a physician of great renown among us express himself in the following terms……. I solemnly affirm and believe, if a hundred or a thousand men of the same age, same temperament and habits, together with the same surroundings, were attacked at the same time by the same disease, that if one half followed the prescriptions of the doctors of the variety of those practising at the present day, and that the other half took no medicine but relied on Nature’s instincts, I have no doubt as to which half would escape. (Petrarch 1364)
When quantitative methods began to be used at the beginning of the 18th century to assess the effects of variolation authors of the comparisons were sometimes reminded of the need to ensure that like was being compared with like. Thus Massey, challenging the interpretation of comparisons of mortality following variolation and after natural smallpox, wrote:
…to form a just Comparison, and calculate right in this Case, the Circumstances of the Patients, must and ought to be as near as may be on a Par (Massey 1723).
Several reports of prospective experiments were published during the eighteenth century. In the most celebrated of these James Lind notes that, apart from the treatments, the twelve patients he studied were otherwise similar: “They all in general had putrid gums, the spots and lassitude, with weakness of their knees. They lay together in one place, being a proper apartment for the sick in the fore-hold; and had one diet common to all” (Lind 1753). Lind does not tell us how he allocated his twelve patients to each of the six treatments he compared, but had he cast lots or used alternation or rotation it would not have been inconsistent with the use of these devices to make fair decisions in other contexts (Silverman and Chalmers 2002).
At the beginning of the 19th century, Alexander Hamilton reported having used alternation to generate parallel comparison groups in a clinical trial of bloodletting done by him and two surgeon colleagues (Lesassier Hamilton 1816). He described how sick soldiers had been “admitted, alternately (my emphasis), in such a manner that each of us had one third of the whole” and that “the sick were indiscriminately received”,” and “attended as nearly as possible with the same care and accommodated with the same comforts” (Lesassier Hamilton 1816). Although his report leaves several uncertainties (Milne and Chalmers 2012), it seems reasonable to speculate that he described the use of alternation to show that an effort had been made to generate comparable treatment groups.
By the middle of the 19th century, the rationale for alternation was sometimes being made explicit. In 1854, Thomas Graham Balfour described his assessment of whether belladonna could prevent scarlet fever. He divided 151 boys into two comparison groups, “taking them alternately from the list, to avoid the imputation of selection” (my emphasis) (Balfour 1854). It is clear from these words that Balfour used alternation to control selection bias. This is not a statistical concept, and although Balfour was a distinguished statistician as well as a doctor, he cannot be regarded as a theoretical statistician in the ‘Pearsonian/Fisherian’ sense (Chalmers and Toth 2009).
There are further isolated examples of alternation being used to generate treatment comparison groups during the last half of the 19th century, but they became increasingly common during the first half of the 20th century. Indeed, alternation as a feature of research design became referred to formally in English not only simply as ‘alternation’ (Bullowa 1928), but also as ‘the alternate method’, ‘rational alternation’ (Choksy 1908), and ‘the alternate case method’ (Choksy 1908 ; Cecil and Plummer 1930). In French it was referred to as ‘la méthode alternante’ (Cousin 1905 ; Netter 1906); and in German as ‘Simultanmethode’ (Wagner-Jauregg 1931). It is worth noting that designation of this methodological principle occurred before the theoretical statistical qualities of random allocation had been promoted in Ronald Fisher’s The Design of Experiments (Fisher 1935). Indeed, even though the word ‘random’ sometimes appeared in reports of controlled trials before the late 1940s, it was often actually alternation that was being used for allocation (Armitage 2002).
Unsurprisingly therefore, the use of alternation was reflected in articles and a book published by the Lancet in 1937, written by the father of medical statistics in Britain, Austin Bradford Hill:
By the allocation of the patients to the two groups we want to ensure that these two groups are alike except in treatment… this might be done, with reasonably large numbers, by a random division of the patients; the first being given treatment A, the second being orthodoxly treated and serving as a control, the third being given treatment A, the fourth serving as a control, and so on, no departure from this rule being allowed [my emphasis]. (Hill 1937)
Of the two essential components of unbiased allocation – genesis of an unbiased sequence, and unbiased implementation of the sequence – the former remains a trivially easy task, while the latter will continue to pose challenges (Chalmers 2009). Hill was aware of this. In an internal report for the MRC dated 22 Dec 1933, Hill expressed concern about the allocation of patients to comparison groups in a MRC study of serum treatment for pneumonia in which alternation should have been used (MRC Therapeutic Trials Committee 1934). Imbalance in the sizes of the comparison groups made clear that alternation had not been strictly observed, prompting Hill to stress in his memorandum that greater effort should be taken “that the division of cases really did ensure a random selection.” In others words, to control allocation bias successfully, Hill realised that it is crucially important to conceal the allocation schedule from those involved in entering participants, thus preventing foreknowledge of allocations.
This principle was reflected in the first properly controlled multicentre trial conducted under the aegis of the British Medical Research Council. This was designed by Philip D’Arcy Hart to assess the effect of patulin on common cold symptoms (MRC 1944 ; Clarke 2004 ; Chalmers and Clarke 2004). When I interviewed him sixty years later, he told me:
Everyone had thought we would use alternation, and we thought we were very clever in setting up a scheme with two patulin groups and two placebo groups using letters to designate each of the four groups, then using rotation to allocate people to the different groups. We thought we were doing something completely new. We wanted to muddle people up. In fact we succeeded in muddling ourselves up. We didn’t always remember what the letters stood for. None of us was a statistician, but we felt that the patulin trial was the first decently controlled trial the MRC had done. (IC interview with Philip D’Arcy Hart, 2 May 2003).
D’Arcy Hart was one of the team – with Marc Daniels and Austin Bradford Hill – that designed the MRC streptomycin trial. The report of the study is a model of clarity. A crucially important element is the statement that “the details of the (allocation) series were unknown to any of the investigators or to the coordinator and were contained in a set of sealed envelopes, each bearing on the outside only the name of the hospital and a number” (MRC 1948). The reason that the MRC streptomycin trial deserves its place in the history of clinical trials is this and other exceptionally clear statements assuring readers that adequate precautions had been taken to minimise the possibility of allocation bias, and thus assure readers that ‘like would be compared with like’ (Doll 1984; 1991).
In spite of a few examples of random allocation during the 1920s and 1930s, alternation remained the principal method for unbiased prospective allocation to treatment comparison groups (see, for example, Podolsky 2006; 2008) until well after the end of the second World War, even in studies done by investigators such as Richard Doll, who were very familiar with Fisher’s writings (Doll 2002). The ‘clinical’ and ‘statistical’ reasons for random allocation came together only during the second half of the 20th century. But even today, as has been noted by the distinguished statistician David Cox, the primary reason for using random allocation is not statistical, but to help prevent foreknowledge of treatment assignments, and thus the conscious or unconscious temptation to allow biased allocation to occur (Cox 2009).
Note and Acknowledgments
A more detailed account of this issue is available in Chalmers (2005). I am grateful to Doug Altman, Peter Armitage, Luc Berlivet, David Cox, Philip D’Arcy Hart, Richard Doll, David Hill, Michael Kramer, Stephen Lock, Irvine Loudon, Harry Marks, Iain Milne, Keith O’Rourke, William Silverman, Stephen Stigler, Ben Toth, Ulrich Tröhler, and Jan Vandenbroucke for commenting on earlier drafts of that paper.
This James Lind Library commentary has been republished in the Journal of the Royal Society of Medicine 2011;104:383-386. Print PDF
Armitage P (2002). Randomisation and alternation: a note on Diehl et al. (https://www.jameslindlibrary.org/articles/randomisation-and-alternation-a-note-on-diehl-et-al/).
Balfour TG (1854). Quoted in West C. Lectures on the diseases of infancy and childhood. London, Longman, Brown, Green and Longmans, p 600.
Bullowa JGM (1928). The use of antipneumococcic refined serum in lobar pneumonia: data necessary for a comparison between cases treated with serum and cases not so treated, and the importance of a significant control series of cases. JAMA 90:1354-1358.
Cecil RL, Plummer N (1930). Pneumococcus Type I pneumonia – a study of eleven hundred and sixty-one cases, with especial reference to specific therapy. JAMA 95:1547-1553.
Chalmers I (1997). Assembling comparison groups to assess the effects of health care. Journal of the Royal Society of Medicine 1997;90:379-386.
Chalmers I (1999). Why transition from alternation to randomisation in clinical trials was made. BMJ 1999;319:1372.
Chalmers I (2001). Comparing like with like: some historical milestones in the evolution of methods to create unbiased comparison groups in therapeutic experiments. International Journal of Epidemiology 30:1156-1164.
Chalmers I (2005). Statistical theory was not the reason that randomisation was used in the British Medical Research Council’s clinical trial of streptomycin for pulmonary tuberculosis. In: Jorland G, Opinel A, Weisz G, eds. Body counts: medical quantification in historical and sociological perspectives. Montreal: McGill-Queens University Press, p 309-334.
Chalmers I (2009). Explaining the unbiased creation of treatment comparison groups. Lancet 2009;374:1670-71.
Chalmers I, Clarke M (2004). The 1944 Patulin Trial: the first properly controlled multicentre trial conducted under the aegis of the British Medical Research Council. International Journal of Epidemiology 32:253-260.
Chalmers I, Toth B (2009). 19th century controlled trials to test whether belladonna prevents scarlet fever (https://www.jameslindlibrary.org/articles/19th-century-controlled-trials-to-test-whether-belladonna-prevents-scarlet-fever/).
Choksy KBNH (1908). On recent progress in serum-therapy of plague. BMJ 1:1282-1284.
Clarke M (2004). The 1944 patulin trial of the British Medical Research Council: an example of how concerted common purpose can get reliable answers to important questions very quickly. The James Lind Library (https://www.jameslindlibrary.org/articles/the-1944-patulin-trial-of-the-british-medical-research-council-an-example-of-how-concerted-common-purpose-can-get-reliable-answers-to-important-questions-very-quickly/).
Cousin M (1905). Des éruptions consecutives aux injections de sérum antidiphthérique et de leur traitement prophylactique par l’ingestion de clorure de calcium. Thèse pour le Doctorat en médicine. Paris: Jules Rousset, 36-44.
Cox DR (2009). Randomization for concealment (https://www.jameslindlibrary.org/articles/randomization-for-concealment/).
Doll R (1984). The controlled trial. Postgraduate Medical Journal 60:719-724.
Doll R (1991). Development of therapeutic trials in preventive and therapeutic medicine. Journal of Biosocial Science 23:365-378.
Doll R (2002). The role of data monitoring committees. In: L Duley, B Farrell, eds., Clinical Trials. London: BMJ Books, 97-104.
Edwards MV (2004). Control and the therapeutic trial, 1918-1948. MD thesis, University of London.
Edwards MV (2006). Control and the therapeutic trial. Amsterdam: Rodopi.
Fisher RA (1935). The design of experiments. Edinburgh: Oliver and Boyd.
Gaudillière J-P (2001). Beyond one-case statistics: mathematics, medicine, and the management of health and disease in the postwar era. In: U Bottazzini, AD Dalmedico, eds. Changing images in mathematics: from the French Revoluton to the New Millennium. London: Routledge, p 283.
Hill AB (1937). Principles of medical statistics. London: Lancet.
Lesassier Hamilton A (1816). Dissertatio medica inauguralis de synocho castrensi (Inaugural medical dissertation on camp fever). Edinburgh: J Ballantyne.
Lind J (1753). A treatise of the scurvy. In three parts. Containing an inquiry into the nature, causes and cure, of that disease. Together with a critical and chronological view of what has been published on the subject. Edinburgh: Printed by Sands, Murray and Cochran for A Kincaid and A Donaldson.
Magnello E (2002). The introduction of mathematical statistics into medical research: the roles of Karl Pearson, Major Greenwood and Austin Bradford Hill. In: E Magnello and A Hardy (eds). The road to medical statistics. Rodopi: Amsterdam.
Marks HM (2000). The progress of experiment. Cambridge: Cambridge University Press.
Massey I (1723). A short and plain account of inoculation. With some remarks on the main argument made use of to recommend that practice, by Mr. Maitland and others. To which is added, a letter to the learned James Jurin, M.D.R.S. Secr. Col. Reg. Med. Lond. Soc. In answer to his letter to the learned Dr. Cotesworth, and his comparison between the mortality of natural and inoculated small pox. The second edition. London: W. Meadows.
Medical Research Council (1944). Clinical trial of patulin in the common cold. Lancet 2:373-5.
Medical Research Council (1948). Streptomycin treatment of pulmonary tuberculosis: a Medical Research Council investigation. BMJ 2:769-782.
Medical Research Council Therapeutic Trials Committee (1934). The serum treatment of lobar pneumonia. BMJ 1:241-245.
Milne I, Chalmers I (2014). Alexander Lesassier Hamilton’s 1816 report of a controlled trial of bloodletting (https://www.jameslindlibrary.org/articles/alexander-lesassier-hamiltons-1816-report-of-a-controlled-trial-of-bloodletting/).
Netter A (1906). Efficacité de l’ingestion de chlorure de calcium comme moyen préventif des éruptions consecutives aux injections de sérum. Séances et Mémoires de la Société de Biologie 58:279-280.
Petrarca F (1364). Rerum Senilium Libri. Liber XIV, Epistola 1. Letter to Boccaccio (V.3).
Podolsky SH (2006). Pneumonia before antibiotics: therapeutic evolution and evaluation in Twentieth-Century America. Baltimore: Johns Hopkins University Press.
Podolsky SH (2008). Jesse Bullowa, specific treatment for pneumonia, and the development of the controlled clinical trial (https://www.jameslindlibrary.org/articles/jesse-bullowa-specific-treatment-for-pneumonia-and-the-development-of-the-controlled-clinical-trial/).
Rosser Matthews J (1995). Quantification and the quest for medical certainty. Princeton, New Jersey: Princeton University Press.
Silverman WA, Chalmers I (2002). Casting and drawing lots: a time-honoured way of dealing with uncertainty and for ensuring fairness (https://www.jameslindlibrary.org/articles/casting-and-drawing-lots-a-time-honoured-way-of-dealing-with-uncertainty-and-for-ensuring-fairness/).
Wagner-Jauregg J (1931). Ueber die Infektionsbehandlung der progressiven Paralyse [On infection treatment of progressive paralysis]. Münchener Medizinische Wochenschrift 78:4-7.