Kaptchuk TJ (2011). A brief history of the evolution of methods to control observer biases in tests of treatments.

© Ted J Kaptchuk, Program in Placebo Studies, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA02115, USA. Email: tkaptchu@bidmc.harvard.edu

Cite as: Kaptchuk TJ (2011). A brief history of the evolution of methods to control observer biases in tests of treatments. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/a-brief-history-of-the-evolution-of-methods-to-control-of-observer-biases-in-tests-of-treatments/)


All of us are aware that we can sometimes mislead ourselves into thinking that we have detected, seen or experienced something that actually isn’t there. The biases that lead to these misperceptions are termed observer biases. They cause a particular problem when people believe that they already ‘know’ the effect of a treatment, or some other kind of intervention. Observer biases can be reduced and sometimes abolished by masked assessment (often referred to as blind assessment). The devices for masking assessments of treatment – when possible, using a placebo (dummy) treatment as part of the comparison – allow comparisons to be made which are independent of any observer bias or preconceived ‘knowledge’.

Masked assessment of treatment outcome

Masked assessment and placebo controls seem to have originated in late 16th century exorcism rites (Kaptchuk 2009). The use of blind assessment and dummy controls was adopted in medicine during the early tug-of-war between orthodox medicine and ‘irregular’ healers (Kaptchuk 1998). Conventional physicians and scientists distrusted the claims of unconventional healers, which were often based on principles that were incompatible with orthodox beliefs. In order to demonstrate that these claims were illusions of the mind or imagined effects, conventional scientists introduced masked assessment. Quite understandably, unconventional healers quickly adopted the methodology to try to demonstrate that their interventions were indeed independent of their beliefs (Kaptchuk 1998).

An example of a masked assessment was performed by a commission of inquiry appointed by Louis XVI in 1784 to investigate the medical claims made by Anton Mesmer (1781) about the effects of ‘animal magnetism’. The commission, headed by Benjamin Franklin, consisted of such distinguished members as Antoine Lavoisier, Jean-Sylvain Bailly and Joseph-Ignace Guillotin. Their goal was to assess whether the purported effects of this new healing method were due to any ‘real’ force, or due to the ‘illusions of the mind’. Among the many tests performed, blindfolded people were told that they were receiving or not receiving magnetism when in fact, at times, the reverse was happening. The people being studied felt the effects of mesmerism only when they were ‘told’ and felt no effects when they were not told, whether or not they were receiving the treatment. They were also given what we would now call placebo or dummy treatments of ‘mesmerized water’ and ‘mesmerized trees’ (Commission Royale 1784).

A few years later, and explicitly inspired by the French investigators, John Haygarth (Haygarth 1800) conducted a single blind experiment using a placebo (sham) device. He showed that a set of fake tractors made of wood achieved similar effects on the symptoms of rheumatism as the effects which had been attributed to `magnetic healing’ using metal tractors, so called ‘Perkinism’.

Masked assessment also became a research tool in the debates on homeopathy, the nineteenth century’s other major form of unconventional healing. Homeopathy claimed that whatever symptom-complex a substance caused in a healthy person, a disease with a similar symptom configuration could be treated by small amounts of the same substance. Homeopaths often used blind assessment and placebo controls for their ‘provings’ which tested the effects of their remedies on healthy volunteers (Löhner 1835; Stolberg 2006; Kaptchuk 1998). Orthodox doctors greeted these claims with scorn, and masked assessment was quickly adopted to adjudicate the dispute. One of the earliest attempts to give a placebo to patients who were told that it was a homeopathic remedy took place in Russia (Ministry of Internal Affairs 1832; Dean 2003). Conceptualisation of experiments became increasingly sophisticated (e.g. Forbes 1846). One of the most sophisticated such designs took place under the Milwaukee Academy of Medicine in 1879-1880. In this trial, which could be described in the modern terms of ‘double-blind,’ both patients and experimenters were masked as to whether the treatment was a genuine homeopathic remedy or a sugar pill (Storke et al. 1880; Kaptchuk 2004).

Conscious use of placebos for treatment

From at least as early at the 18th century, mainstream physicians used placebos to treat their patients (Cullen 1772; Hunter 1788; Gordon 1788; Trotter 1792). William Cullen (1772) mentions giving mint water as ‘placebos’ in his lecture notes (Kerr et al. 2007). In the United States, for example, Austin Flint gave thirteen patients with rheumatism a placebo and concluded that orthodox drug treatment was usurping the credit due to ‘nature’ (Flint 1863). At Guy’s Hospital in London, William Withey Gull came to similar conclusions after treating 21 rheumatic fever patients ‘ for the most part with mint water’ (Sutton 1865). It was not until much later, however, that a more skeptical attitude in mainstream medicine led to a recognition that there was a need to adopt masked assessment and placebos to assess the validity of its own more ‘scientific’ (and therefore more ‘reasonable’) claims.

Adoption of masked assessment in mainstream medicine

Masked assessment seems tohave first entered the conventional medical world at the end of the nineteenth century, during the French hypnotism-suggestion debates. In a continuation of the earlier controversy about ‘ animal magnetism’, newly `psychologized’ forms of mesmerism – such as hypnotism, psychical research and suggestion – were frequently being subjected to well-publicized tests with masked assessment (Moll 1891; Rivers 1908; Dingwall 1967-68; Hacking 1988; Gauld 1992). Psychologists and others were also performing blind assessment to study sensory discrimination (Pierce and Jastrow 1884). During this time, the renowned physician-physiologist Brown-Sequard claimed dramatic therapeutic effects from an animal testicular extract, and his claims provoked other physiologists and pharmacologists to test them in masked experiments (Variot 1889; Eloy 1893; Pregl 1896).  And in an investigation of the impact of polished and unpolished rice on the prevalence of beri-beri, Adolphe Vorderman, a prison medical officer in the Dutch East Indies, clearly understood the importance of masked assessment (Vorderman 1897; Vandenbroucke 2003).

Inspired principally by pharmacologists, German clinical scientists gradually adopted masked assessment, one of the most dramatic examples being the trial performed by Adolf Bingel (Bingel 1918; Tröhler 2010). Between 1911 and 1914, he assigned 937 patients alternately to either diphtheria antitoxin serum or simple horse serum. All patients and participating physicians (except Bingel) were unaware of which serum each patient had been allocated. A strong tradition of blind assessment developed in Germany, and this was codified by the clinical pharmacologist Paul Martini (Martini 1932; Shelly and Baur 1999; Stoll 2004) before the Nazi period devastated German academic medicine.

Blind assessment in the modern anglophone world, particularly in the United States, appears to have begun when pharmacologists were influenced by the German tradition, as well as by an American ‘quackbuster’ movement that used masked assessment (Kaptchuk 1998). By the 1930’s, anglophone researchers had taken the lead in using placebo controls in clinical experiments (observer-bias). Among these researchers, the strenuous advocacy of the importance of masked assessment by Harry Gold of Cornell University Medical School appears to have had a particularly important influence (e.g. Conference on Therapy 1946; Conference on Therapy 1954).

Masked assessment to reduce observer bias nevertheless remained on the margins of research in health care until random assignment of treatment groups gradually became accepted during the second half of the 20th century as the most effective way of avoiding allocation bias in comparative trials.  Placebo controls furthered this agenda. Together with randomization, masked assessment, when possible using placebos, has now become one of the crucial methodological components for minimizing biases in assessing the effects of interventions in health care.

A more detailed history of masked assessment and placebo controls has been published in the Bulletin of the History of Medicine (1998;72.3:389-433), and an electronic copy of this article is located here. [DOI: 10.1353/bhm.1998.0159]

Funding:  The research for this paper was partly funded by NIH-NCCAM grant #K24 AT004095.


Bingel A (1918). Über Behandlung der Diphtherie mit gewohnlichen Pferdeserum. Deutsches Archiv fur klinische Medizin 125:284-332.

Commission Royale. Bailly A (1784). Rapport des commissaires chargés par le Roi, de l’examen du magnétisme animale. Imprimé par ordre du Roi. Paris: A Paris, de L’Imprimerie Royale.

Conferences on Therapy (1946). The use of placeboes in therapy. New York Journal of Medicine 46:1718-1727.

Conference on Therapy (1954). How to evaluate a new drug. American Journal of Medicine 17:722-727.

Cullen W (1772). Clinical lectures. Edinburgh, Feb-April, 218-219.

Dean ME (2003). ‘An innocent deception’: placebo controls in the St Petersburg homeopathy trial, 1829-30. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/an-innocent-deception-placebo-controls-in-the-st-petersburg-homeopathy-trial-1829-30/).

Dingwall EJ (1967-68). Abnormal hypnotic phenomena: a survey of nineteenth­ century cases, 4 vols. London: Churchill.

Eloy C (1893). La méthode de Brown-Séquard. Paris: JB Baillière, p 47.

Flint A (1863). A contribution toward the natural history of articular rheumatism; consisting of a report of thirteen cases treated solely with palliative measures. American Journal of Medical Science 46:17-36. (see page 21).

Forbes J (1846). Homeopathy, allopathy and ‘young physic.’ Brit & For Med Rev 21:225-265.

Gauld A (1992). A history of hypnotism. Cambridge: Cambridge University Press.

Gordon A (1788). The practice of physick. Unpublished manuscript.

Hacking I(1988). Telepathy: origins of randomization in experimental design. Isis 79:427-451.

Haygarth J (1800). Of the imagination, as a cause and as a cure of disorders of the body: exemplified by fictitious tractors, and epidemical convulsions. Bath: R. Crutwell.

Hunter J (1788). A treatise on the venereal disease. London: G Nicol and J Johnson, p 69-70.

Kaptchuk TJ (1998). Intentional ignorance: a history of blind assessment and placebo controls in medicine. Bulletin of the History of Medicine 72:389-433.

Kaptchuk TJ (2004). Early use of blind assessment in a homeopathic scientific experiment. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/early-use-of-blind-assessment-in-a-homeopathic-scientific-experiment/).

Kaptchuk TJ, Kerr CE, Zanger A (2009). Placebo controls, exorcisms and the devil.  Lancet  374:1234-35.

Kerr CE, Milne I, Kaptchuk TJ (2007). William Cullen and a missing mind-body link in the early history of placebos. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/william-cullen-and-a-missing-mind-body-link-in-the-early-history-of-placebos/).

Löhner G, on behalf of a Society of truth-loving men (1835). Die Homoöopathischen Kochsalzversuche zu Nürnberg [The homeopathic salt trials in Nuremberg]. Nürnberg in März.

Martini P (1932). Methodenlehre der Therapeutischen Untersuchung. Berlin: Springer.

Ministry of Internal Affairs (1832). [Conclusion of the Medical Council regarding homeopathic treatment]. Zhurnal Ministerstva Vnutrennih del;3:49-63.

Moll A (1891). Hypnotism. London: Walter Scott, p 224.

Pierce CS, Jastrow J (1984). On small differences in sensation. Memoirs of the National Academy of Sciences 3:75-83.

Pregl F (1896). Zwei weitere ergographische Versuchsreihen über die Wirkung orchistischen Extraktes. Archiv fur die gesamte Physiologic: 62:379-99 (see p 387).

Rivers WHR (1908). The influence of alcohol and other drugs on fatigue. London: Edward Arnold.

Shelley JH, Baur MP (1999). Paul Martini: the first clinical pharmacologist? Lancet 353:1870-1873.

Stolberg M (2006). Inventing the randomized double-blind trial: The Nuremberg salt test of 1835. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/inventing-the-randomized-double-blind-trial-the-nurnberg-salt-test-of-1835/).

Stoll S (2004). Paul Martini’s Methodology of therapeutic investigation. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/paul-martinis-methodology-of-therapeutic-investigation/).

Storke EF, Martin R, Rosenkrans EM, Ford J, Schloemilch A, McDermott GC, Carlson OW (1880). Final report of the Milwaukee test of the thirtieth dilution. Homoeopathic Times 7:12/280-1.

Sutton HG (1865). Cases of rheumatic fever, treated for the most part by mint water. Collected from the clinical books of Dr Gull, with some remarks on the natural history of that disease. Guy’s Hospital Report 11:292-428. (see p 392)

Tröhler U (2010). Adolf Bingel’s blinded, controlled comparison of different anti-diphtheritic sera in 1918. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/adolf-bingels-blinded-controlled-comparison-of-different-anti-diphtheritic-sera-in-1918/).

Trotter T (1792). Observations on the scurvy. London: Longman, p 137-138, 184.

Vandenbroucke JP (2003). Adolphe Vorderman’s 1897 study of beriberi among prison inmates in the Dutch East Indies: an exemplar of scrupulous efforts to avoid bias. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/adolphe-vordermans-1897-study-on-beriberi-an-example-of-scrupulous-efforts-to-avoid-bias/).

Variot MG (1889). Trois expériences sur faction physiologique du suc testiculaire injectée sous la peau, suivant la méthode de M. Brown-Séquard. Comptes Rendus de la Societé de Biologie 41:451-454.

Vorderman AG (1897). Onderzoek naar het verband tusschen den aard der rijstvoeding in de gevangenissen op Java en Madoera en het voorkomen van beri-beri onder de geïnterneerden. Batavia: Jav. Boekh. & Drukkerij.