Jesse Bullowa’s 1928 report on the use of antipneumococcic serum in lobar pneumonia (Bullowa 1928a) was subtitled “Data necessary for a comparison between cases treated with serum and cases not so treated, and the importance of a significant control series of cases.” It stands out as one of the most sophisticated invocations of the controlled clinical trial in the 1920s, and as a self-conscious attempt to justify and promote the methodology. Indeed, Bullowa’s contribution to a symposium on the treatment of pneumonia at the New York Academy of Medicine in December 1927 was entitled simply The control (Bullowa 1928b). In focusing explicit attention on methodological issues, Bullowa made clear that he wished to determine “whether there was conclusive proof that the serum is of value” (Bullowa 1928a, p 1357).
Bullowa, nearly fifty years old at the time of his presentation at the New York Academy of Medicine, was a clinical professor at the New York University College of Medicine (Alexander 1944) and co-director, with Milton Rosenblüth, of Harlem Hospital’s pneumonia service, then “the largest therapy unit of any hospital in the city” (Maynard 1978, p 41). During the previous decade, Rufus Cole and his colleagues at the Hospital of the Rockefeller Institute had introduced serotype-specific antiserum for the treatment of pneumococcal pneumonia (as of 1928, only four such pneumococcal serotypes were known). When the Metropolitan Life Insurance Company lost twenty-four million dollars in excess death benefits during the influenza epidemic of 1918, it established an Influenza Commission, which would soon focus its attention upon pneumonia and its treatment with antipneumococcal antiserum (Podolsky 2006). Intending to fund alternate control studies at multiple institutions (Oliver and Stoller 1925), the company would ultimately fund published studies arising from Bellevue and Harlem Hospitals in New York, and from Boston City Hospital (Cecil and Sutliff 1928; Park et al. 1928; Finland 1930). However, none of the pneumonia investigators – who included such eventual luminaries as Russell Cecil (at Bellevue Hospital in New York) and Maxwell Finland (at Boston City Hospital) – would so explicitly advocate the methodology of the controlled clinical trial as would Jesse Bullowa.
The challenge of variable prognosis
Bullowa had not used or advocated the use of a controlled series in his previous work on influenza and the role of the tonsils in systemic illness (Bullowa 1919a; 1919b). The novel specific treatment for pneumonia appears to have provided a turning point for Bullowa to focus upon the controlled clinical trial as a means of examining and justifying the utility of potential remedies. He started out by grounding his controlled study in the particularities of studying pneumonia. A theoretical rationale for such a study was based on the very variability of pneumococcal pneumonia’s severity:
The evaluation of the effect of any therapeutic procedure in pneumonia is attended with certain inherent difficulties. Probably seven of every ten patients recover regardless of treatment, and therefore if one chances on a succession of favorable cases one is apt to attribute the benefit to the special treatment then in use. … [Conversely,] a short series of fatalities, unless carefully controlled and analyzed, may lead to a condemnation of what is really a very useful procedure (Bullowa 1928a, p 1354).
More practically, the study was grounded in the need, if antiserum’s efficacy was confirmed, to convince clinicians – who had generally “not as yet adopted the serum treatment for pneumonia” – of serum’s utility (Bullowa 1928a, p 1354). To this end, an entirely separate pneumonia service had been established at Harlem Hospital in New York, with its own resident physician, four bacteriologists, and two chemists who could all give “their full time to the investigations” conducted upon the large numbers of patients admitted with the dreaded disease (Park et al. 1928, p 1503).
Comparing like with like
Such a service was indeed impressive, and 365 cases would be investigated by December 1927, with 793 investigated by June 1928 (Park et al. 1928a, p 1504-1506). Yet Bullowa’s true innovation was embedded in the conduct and explanation of the investigation itself. The methodological first step of the controlled study was to ensure that “there must be no selection of patients.” Bullowa relied upon a strict “alternation” of patients to serum treatment versus standard treatment groups, in which “only the order of arrival in the ward determines whether a patient is to receive serum” (Bullowa 1928a, p 1354).
Bullowa, moreover, took special care to ensure and emphasize the fidelity of the implementation of such alternation. When planning the study, Bullowa established explicit criteria for the “rejection of patients” (Bullowa 1928a, p 1354), what would be considered ‘exclusion criteria’, in modern parlance. Only pneumococcal pneumonia patients were to be included; and if a “tuberculous pneumonia” case was accidentally included, the case would be rejected, and the “patient’s place [was] not filled, but the alternation continued as before in order to avoid selection” (Bullowa 1928a, p 1354-1355). Yet for the pneumococcal pneumonia patients, reported Bullowa, even “if a patient dies of a surgical accident or complication, even though he has recovered from the pulmonary involvement, it is charged against the series” (Bullowa 1928a, p 1354), in anticipation of what today would be considered an ‘intention to treat analysis.’ At the conclusion of the trial, Bullowa was able to point out that the integrity of such measures was confirmed by the equivalence in pneumococcal serotypes and in patients arriving early versus late in the serum-treated versus serum-untreated groups. Yet, for all of his stated commitment to an intention-to-treat analysis, Bullowa excluded four “deaths within twenty-four hours of admission” (three Type I serum-treated cases, and only one control) from his final analysis of the 1927 data (Bullowa 1928a, p 1356, 1357).
Nevertheless, Bullowa’s chief innovation was in the care he took in comparing the outcomes in the comparison groups. As a foundation, he constructed an elaborate “severity rating” for patients, a hundred-point scale “for subsequent comparison of cases of like degree,” based upon pulmonary, neurological, circulatory, and gastrointestinal status, as well as upon such “complications and special factors” as pregnancy and bacteremia (Bullowa 1928a, p 1355). Through such a rating system, Bullowa could sub-stratify patients not only by pneumococcal serotype, but into three proposed classes of “poor,” “fair,” and “good” baseline characteristics.
Taking account of the play of chance
Finally, Bullowa could turn to the statistical comparison of outcomes between the serum-treated and control groups. Tellingly, he felt the need to begin this section with “a brief digression” regarding the tests he was to apply “to see whether proof is adequate and how we can determine the number of cases necessary for a fair evaluation,” dependent upon “what difference in results is statistically significant” (Bullowa 1928a, p 1357). Reflecting Raymond Pearl’s own explication of the subject in his Introduction to Biometry and Statistics (Pearl 1923, p 209-219), Bullowa went on to concisely illustrate such contemporary biometric notions concerning how each group’s outcomes could be visualized as a bell-shaped curve, the width of which would be representative of the “standard error” of the distribution of outcomes for that group. The more samples obtained for each group, the narrower the corresponding curve; and the less the curves for discrete groups overlapped, the more likely the differences between them were “significant.” Numerically, this could be represented as a ratio of outcome differences between the two groups divided by the square root of the sum of the squares of the two groups’ standard errors (in which the larger the sample size for a particular group, the smaller its standard error). If such a ratio exceeded two, then the differences were to be considered “statistically significant” (Bullowa 1928a, p 1357). [Bullowa’s calculation was a bit vague in the 1928 paper, and was presented simply as the “ratio of the difference to its standard error” (Bullowa 1928a, p 1357). He would define the calculation more precisely in his 1937 textbook on pneumonia (Bullowa 1937, p 288-292), in which he would equate a ratio of 2.0 with what in modern parlance would be considered a “p value” of <0.045.]
As Bullowa emphasized, such methodology held several implications. Logistically, as was the case by October of 1927 for Type I serum (where the ratio was 1.9), such outcome differences could be rendered “significant” not only through obtaining more beneficial serum results, but simply through obtaining the same results over a larger group of patients, as would indeed be the case by June 1928 (Bullowa 1928a, p 1357-1358; Park et al. 1928, p 1505-1506). Hence, as Bullowa concluded, and in anticipation of later considerations of statistical power: “The size of the series requisite is determined by a consideration of the standard error” (Bullowa, 1928a, p 1358) – or as he stated in his address at the New York Academy of Medicine, but did not record in his journal article: “Naturally, as physicians, we considered how long we were justified in continuing to deprive patients in the control series of what may be a valuable and available therapeutic aid” (Bullowa 1928b, p 341).
Ultimately, and critically for Bullowa, such methodological considerations were to be tied to the moral conduct of therapeutics itself. As he declared: “When for type I pneumonia the ratio of the difference in percentage fatality between the serum-treated and non-serum-treated cases, to its standard error, becomes more than 2 or 3, it will be our duty to administer serum in all type I cases and to urge its administration on others” (Bullowa 1928a, p 1358). And by the time Bullowa next addressed the New York Academy of Medicine on antipneumococcal antiserum, in November of 1928, he could report: “In fact, we have ceased, at Harlem Hospital, to alternate the use of serum in cases in which we find an invasion of the blood stream with Type I organisms, for it is felt that, for these cases, statistical proof has been given, and it is unjust to withhold an available life-saving procedure; the ratio of the difference to the standard error, in this type, is 2.4 to 1” (Bullowa 1929, p 339).
Context and irony
William H. Park, director of New York City’s Bureau of Laboratories, served as a member of the Metropolitan Life Influenza Commission; and his own biographer states not only that he and Bullowa had first investigated the use of serum together at Harlem Hospital in the spring of 1926, but that “the entire project in New York City was under the general guidance and supervision of Park” (Oliver 1941, p 444-445). Nevertheless, despite Park’s own prior interest in the use of alternate controls in adjudicating the serum treatment of both diphtheria and scarlet fever (Park 1906, p 123; Park 1925, p 1183; Park 1931, p 1443), it remains unclear the degree to which Park ultimately influenced Bullowa in this respect. Park seemed somewhat uneasy regarding the ethics of such controls – i.e., withholding serum treatment from the control groups – and looked for the earliest opportunity to render such controls unnecessary (Park 1906, p 123; Park 1925, p 1183). Bullowa, however, seems to have been the one to tie such trial endpoints themselves to statistical notions of significance.
In this latter respect, Bullowa would also note that “Dr. Louis I. Dublin and his staff of the Statistical Bureau, Metropolitan Life Insurance Company, tabulated the data” generated by the investigation (Bullowa 1928b, p 339), and later he would add “and guided us in their interpretation” (Bullowa 1929, p 336). And unfortunately, while an ethos favoring controlled studies seems to have pervaded the Metropolitan Life-inspired studies generally, the extent to which Dublin and his staff influenced Bullowa and/or his paper remains unclear. 1 Each of the Metropolitan Life-inspired studies of antipneumococcal antiserum embodied such an ethos of ‘control’. Russell Cecil and Wheelan Sutliff, for example, would compare characteristics of their serum-treated and control groups, generated by alternation, to ensure that like was being compared with like (Cecil and Sutliff 1928, p 2039). Again, however, Bullowa was exceptional in tying not only the methodology, but also the moral outcome of the studies, to contemporary notions of statistical significance. And among the lineage of alternate control studies in pneumonia, neither the subsequent study of digitalis at Bellevue Hospital (built upon the antiserum “machinery that could readily be adjusted to include an investigation of the effects of the use or nonuse of digitalis as routine therapy in this disease as well” (Wyckoff et al. 1930, p 1243), nor the Medical Research Council study of antiserum in Great Britain (MRC 1934) would rely upon such mathematical ‘proof’ of significance. Paradigmatically, by 1937, the same year Austin Bradford Hill would publish the first edition of his Principles of Medical Statistics (Hill 1937) , Bullowa would begin the “Serum Therapy” chapter of his Management of the Pneumonias with a thirteen-page extension of his statistical rationale (Bullowa 1937, p 283-298).
By the late 1930s, the advent of the sulfa drugs (considered “chemotherapy” at the time) threatened to displace the use of the more expensive and labor-intensive antipneumococcal serotherapy (Podolsky 2006). Sulfapyridine, the first truly anti-pneumococcal sulfa drug, would not be available to investigators until the 1938-1939 winter pneumonia season, though this did not stop clinicians from attempting to treat patients with the first of the sulfa drugs, sulfanilamide. Tellingly, Bullowa wrote of sulfanilamide in his book: “If it shall be shown in an alternated series of patients of the same type and age that bacteremic incidence is reduced or that bacteremic patients are saved the value of the drug for human pneumonias will be demonstrated. This evidence has not been collected as yet” (Bullowa 1937, p 201).
And with the introduction of sulfapyridine, a cadre of researchers, including Bullowa, sought to evaluate the possibility of combination serochemotherapy, intending to compare this with monotherapy with sulfapyridine alone. Given that Bullowa had worked so long and hard to equate the physician’s moral ‘duty’ to prescribe antiserum with its justification through the controlled clinical trial, it is ironic that in defense of antiserum, he more than others in his generation drew attention to the limits of the applicability of the results of controlled trials.
By early 1939, Bullowa had “rotated in treatment” 324 adults among serotherapy, sulfa monotherapy, and combination therapy, and found that, on average, sulfa monotherapy was indeed superior to combination therapy. Yet in re-interpreting his data after post-hoc sub-stratification, he claimed that cases treated within the first four days of illness seemed better after combination therapy, this difference nearly achieving statistical significance (Bullowa et al. 1940, p 374). Thus, despite his admission that “final conclusions regarding the effect of drug and antibody in the treatment of the pneumonias must await additional observations,” Bullowa felt free to posit, based upon immunological expectations, “that serum plus sulfapyridine is a more effective therapeutic agent than either acting alone in the early cases when autogenous antibody cannot be expected to be present” (Bullowa et al. 1940, p 375, 376).
By mid-1941, however, Norman Plummer’s group at Bellevue Hospital (Plummer et al. 1941) – following Russell Cecil, and alternating 607 patients – had found not only that the sulfa monotherapy group fared better than the combination therapy group on average, but that such findings held true regardless of whether the patients were treated early or late in the disease. Indeed, Plummer continued his post-hoc analysis by showing that stratification for age of the patient and bacteremia failed to identify any subgroup favoring combination therapy, thus seemingly putting an end to the ‘combination therapy’ question.
But Bullowa refused to give up on serum, arguing for ever-finer sub-stratification. Taken to its logical extreme – that “under certain conditions, one remedy may act more favorably than the other, so that there may be an appropriate remedy for each patient, as well as a better remedy for the majority” (Bullowa 1940, p 568) – Bullowa’s defense of serum threatened the very foundation of the controlled study generally, based upon the limitations of its applicability in individual patients. Yet more narrowly, it appears he simply looked forward to a more comprehensive machinery capable of more extensive and appropriate stratification, in which “all the factors which are known to influence prognosis, such as age, duration of illness, bacteremia, detection of capsular polysaccharide and concurrent disease, must be taken into consideration” (Shackman and Bullowa 1943, p 345; see also Bullowa, in Stahle 1942, p 446-447; Bullowa, DeGara, and Bukantz 1942, p 2). Either way, by this time, Bullowa had outrun the machinery capable of rendering such an investigation. Indeed, he died two months after publishing these lines, and antipneumococcal antiserum would itself pass away from the therapeutic armamentarium within the two succeeding years. But Bullowa had left a profound legacy, in demonstrating to his contemporaries the potential utility – as well as the limitations – of the controlled clinical trial as a tool for adjudicating therapeutic efficacy in the emerging era of modern specific treatments.
This James Lind Library commentary has been republished in the Journal of the Royal Society of Medicine 2009;102:203-207. Print PDF
Alexander J (1944). Obituary: Jesse G.M. Bullowa. Science 99:462-463.
Bullowa JGM (1919a). Local evidence of tonsil involvement in the causation of distant or systemic disease. Med Clin North Am 2: 1101-1114.
Bullowa JGM (1919b). Influenza of the head and chest. Med Clin North Am 2:1115- 1130.
Bullowa JGM (1928a). Use of antipneumococcic serum in lobar pneumonia: data necessary for a comparison between cases treated with serum and cases not so treated, and the importance of a significant control series of cases. JAMA 90:1354-1358.
Bullowa JGM (1928b). The control (Abstract). Contribution to a symposium on the use of antipneumococcic refined serum in lobar pneumonia, 15 December 1927. Bulletin of the New York Academy of Medicine 4:339-343.
Bullowa JGM (1929). The serum treatment and its evaluation in lobar pneumonia. Bulletin of the New York Academy of Medicine 5:328-362.
Bullowa JGM (1937). The management of the pneumonias. New York: Oxford University Press.
Bullowa JGM (1940). Rationale of specific therapy for pneumococcic pneumonias. Psychiatric Quarterly 14:568-582.
Bullowa JGM, DeGara PF, Bukantz SC (1942). Type-specific antibodies in the blood of patients with pneumococcic pneumonia: detection, incidence, prognostic significance and relation to therapies. Arch Int Med 69:1-14.
Bullowa JGM, Osgood EE, Bukantz SC, Brownlee IE (1940). The effect of sulfapyridine alone and with serum on pneumococcic pneumonia and on pneumococcus-infected marrow cultures. Am J Med Sci 199:364-380.
Cecil RL, Sutliff WD (1928). The treatment of lobar pneumonia with concentrated antipneumococcus serum. JAMA 91:2035-2042.
Finland M (1930). The serum treatment of lobar pneumonia. New England Journal of Medicine 202:1233-1277.
Hill AB (1937). Principles of medical statistics. London: The Lancet, Limited.
Maynard A (1978). Surgeons to the poor: The Harlem Hospital Story. New York: Appleton-Century Crofts.
Medical Research Council (1934). The serum treatment of lobar pneumonia: a report of the Therapeutic Trials Committee of the Medical Research Council. BMJ 1:241-245.
Oliver WW (1941). The man who lived for tomorrow: A biography of William Hallock Park. New York: E.P. Dutton and Co.
Oliver WW, Stoller EA (1925). Notes on the therapeutic value of pneumococcus antibody solution subcutaneously administered in lobar pneumonia. Arch Int Med 35:266-286.
Park WH (1906). A critical study of the results of serum therapy in the diseases of man. Harvey Lectures: 101-142.
Park WH (1925). Scarlet fever: etiology, prevention by immunization, and antitoxic treatment. JAMA 85: 1180-1186.
Park WH (1931). The history of diphtheria in New York City. Am J Dis Child 42: 1431-1445.
Park WH, Bullowa JGM, Rosenblüth MB (1928). The treatment of lobar pneumonia with refined specific antibacterial serum. JAMA 91:1503-1508.
Pearl R (1923). Introduction to medical biometry and statistics. Philadelphia: W.B. Saunders.
Plummer N, Liebmann J, Solomon S, Kalkstein M, Ensworth HK (1941). Chemotherapy versus combined chemotherapy and serum in the treatment of pneumonia: A study of 607 alternated cases. JAMA 116:2366-2371.
Podolsky SH (2006). Pneumonia before antibiotics: therapeutic evolution and evaluation in twentieth-century America. Baltimore: Johns Hopkins University Press.
Shackman NH, Bullowa JGM (1943). Sulfadiazine administered alone and with antipneumococcus serum in the treatment of pneumococcic pneumonia. Arch Int Med 72:329-345.
Stahle DC (1942). A clinical analysis of fifteen thousand cases of pneumonia: an evaluation of the effectiveness of various therapeutic agents. JAMA 118:440- 447.
Wyckoff J, Dubois EF, Woodruff O (1930). The therapeutic value of digitalis in pneumonia. JAMA 95:1245-1249.
- I have been unable to find information about this partnership in the Louis Dublin papers (held by the National Library of Medicine). Moreover, the records of the Met Life Influenza Commission contained in the Lee Frankel papers (held by the American Jewish Historical Society) end in 1922, though the Commission would continue throughout the following decade. No such records exist in the Milton J. Rosenau papers (held at the University of North Carolina), either. It also remains unclear the degree to which Bullowa himself learned “medical biometry and statistics” from Pearl’s text, from another text, or from Dublin.