About the author:
Bill Silverman graduated from the University of California Medical School in 1942. He trained as an intern and resident in pediatrics at UC Hospital in San Francisco during 1942-44, and was resident at The Babies Hospital in New York City during 1944-45. He met Ruth Hirsch, then a nursing student at Columbia-Presbyterian Hospital, in 1944; they were married in 1945, and have three children: Dan, Jen and David. Bill rose to the academic rank of Professor of Pediatrics, Columbia University; he was elected to the Society for Pediatric Research, the American Pediatric Society, and an Honorary Founder Fellow of Britain’s Royal College of Paediatrics and Child Health. He is the author of Dunham’s premature infants. Third edition. Hoeber, 1961; Retrolental fibroplasia: A modern parable. Grune & Stratton, 1980 (available online at www.neonatology.org/classics/parable/default.html); Human Experimentation: A guided step into the unknown. Oxford University Press, 1985; Where’s the evidence? Oxford University Press, 1998; and he has written the column “From our correspondent” in Paediatric and Perinatal Epidemiology since that journal’s inception in 1987. On 16 December 2004, aged 87, Bill Silverman died comfortably at home with his family, having declined chronic dialysis for renal failure, because, as he told his doctors, “it would be an unethical waste of resources to prolong my death in this way”.
In 1945, when I began the practice of general pediatrics in New York City, I was appointed to the teaching staff at Columbia University as an instructor at The Babies Hospital. Fortunately for me, as it turned out, Richard Day had just returned to the hospital (he spent the Second World War at the US Army’s Environmental Research Laboratory working on an improved hand-glove for use in arctic climates). I quickly became one of his most devoted followers. He was the first teacher I ever met who replied to most of my questions with an unapologetic, “I don’t know!”
When Dick came across Bradford Hill’s book (Principles of medical statistics) around 1947, he immediately recognized the importance of controlling biases and of statistical arguments in clinical research. He decided to spread the word at our hospital in a series of talks to the staff. But the word ‘statistics’ was off-putting and attendance was very poor – I was the only one in the audience at his last lecture! Like Dick, I was completely sold on the numerical approach; soon we were making nuisances of ourselves by criticizing the subjective ‘in-my-experience’ reasoning of our co-workers. For example, when we first tried to evaluate x-ray diagnoses of hyaline membrane disease in premature infants, we approached John Caffey, the pioneering pediatric radiologist at our hospital (behind his back we called him ‘Cactus Jack’). We asked him to make ‘blind’ readings of x-rays (to read unmarked films, without clinical information, and again after a one-month lapse). He refused to cooperate and was unmoved by our pleas for the need to determine rates of false positive and false negative diagnoses of hyaline membrane disease: “I wouldn’t believe your statistical arguments,” he told us, “even if you proved them to me.” (It took me years to realize that Jack was certainly right about the nature of ultimate proof in medicine!)
In 1949, a large federally funded premature infant station was opened in The Babies Hospital; it was part of a newly established city-wide program for the specialized care of the smallest newborn infants. They were transported from small maternity hospitals in a specially outfitted ambulance operated by the City of New York to stations in the large university hospitals. Routine ophthalmoscopic examinations of each infant at weekly intervals had just been initiated in our new unit to assess the prevalence of retinopathy of prematurity (originally called retrolental fibroplasia, but I will use ROP, the present-day term, here). This mysterious disease was first described in Boston in 1942, but early stages had never been diagnosed in New York up to this time. A few weeks after our new unit opened, Fred Blodi, the young ophthalmologist-in-training who carried out the eye examinations, found the first example of the initial retinal vascular signs of the strange blinding disorder. (The baby, my patient, was the child of the professor of biochemistry. Prior to this pregnancy his wife had had six miscarriages. The current 1200 gram premature infant was the first of her offspring to live.)
We panicked at the thought that this ‘precious child’, who was now thriving, would go blind. Something must be done immediately, we felt, to prevent a tragedy. Fred Blodi suggested we try using the then-new anti-inflammatory agent, adrenocorticotropic hormone (ACTH). In desperation, we grasped at this ‘straw’ and administered the powerful drug. As it had never been used before in newborn infants, we made a crude guess about dosage based on reports of use in experimental animals. The vascular proliferation seemed to subside after treatment began, but the systemic effects of the drug (ravenous hunger, growth arrest, Cushing-syndrome-like manifestations) were frightening. We reduced the dose but the eye changes worsened, so the dose was increased and, again, the changes improved. After about a week on treatment, we cautiously reduced the ACTH. This time there was no flare up and we stopped treatment without incident. The infant gained weight and her eyes were almost normal when she was sent home.
Following this dramatic experience, 31 infants with early changes of ROP were treated with ACTH: 25 left the hospital with normal eyes, 2 became blind, 2 lost all vision in one eye, and 2 had useful vision with only minor retinal scars. Our results were impressive when compared with 7 untreated infants with early retinal disease who had been looked after at Lincoln Hospital: 6 of these had become totally blind. Although ACTH seemed to be the cure for ROP, we were brought up short by the observations in 2 infants; the one infant at Lincoln who had escaped blindness without treatment, and another baby, followed by Blodi at our hospital in whom early abnormalities had subsided without treatment.
Our exciting post hoc results did not satisfy the rules of evidence we had been touting loudly at our hospital for more than two years! There was no way to avoid the issue: inspired by the then recently reported MRC randomized trial of streptomycin for pulmonary tuberculosis in the UK, we concluded that a randomized controlled trial involving the two New York hospitals was the only way to test the efficacy question quickly and rigorously. After much soul searching, Dick Day and I approached our chairman, Rustin McIntosh, and explained our agonizing problem. Although we felt sure ACTH prevented blindness, we were worried about broadcasting this provocative claim based on the weak evidence in a consecutive series of our patients without a concurrent comparison group (and we could not ignore Blodi’s experience with infants who had improved spontaneously). We asked for permission to conduct what was for us a completely unprecedented clinical exercise: a randomized controlled trial involving newborn infants had never been carried out in our hospital, or for that matter, anywhere else. After a few minutes of silence at the end of our impassioned presentation, Rusty said, “You must do it!” I will always remember his courageous decision. This was years before research ethics committees reviewed such proposals; the entire responsibility for this daring action rested on his shoulders.
It took only a few months to accumulate the required experience in the two hospitals (Reese et al. 1952). Allocation to ACTH or no ACTH was decided by drawing marbles from a jar containing an equal number of white and blue marbles: one morning, when a new infant became eligible for enrollment, I noticed that our head nurse shook the jar vigorously, turned her head away, pulled a marble out (just as she had been instructed); but because she did not like the ‘assignment’, she put the marble back, shook the jar again, and pulled out the color that agreed with her bias! The importance of Bradford Hill’s precaution in Britain’s famous streptomycin trial to conceal the order of assignment in sealed envelopes was immediately obvious!
Approximately one third of the infants treated with ACTH became blind (in one or both eyes), but only one-fifth of those who had not received the drug. Furthermore, there were more fatal infections in the ACTH-treated group. When the entire two-year experience in both hospitals was examined, the frequency of scarring ROP was not startlingly different, but the mortality rates were quite disparate: the untreated group had fared better.
Two years later, ophthalmologists at Johns Hopkins reported that three-quarters of infants with early vascular changes of ROP showed spontaneous regression to normal. This was exactly our experience at The Babies Hospital during the ACTH-treatment episode. Following this disheartening (but very educational!) experience, a formal controlled trial was undertaken to test the possibility that exposure to light was responsible for blinding these babies. Once more the results were negative (Locke and Reese 1952). Needless to say, these dramatic early experiences transformed my life – I became a highly vocal advocate of randomized controlled trial methodology!
By the early 1950s, after numerous false leads, most physicians immersed in the study of ROP were understandably skeptical of all new claims of cause and cure. The disease was now responsible for blinding about 10,000 premature infants throughout the world. In the spring of 1953, a group of concerned pediatricians and ophthalmologists, convened in Bethesda by the US Public Health Service, discussed the increasingly urgent claims that liberal use of supplemental oxygen was responsible for the huge epidemic (In A cautionary tale about supplemental oxygen I have given an account of the oxygen story (Silverman 2004 a)). The above-noted experience at The Babies Hospital explains why Dick Day and I fought so vigorously for the randomized controlled trial format when the 1953-54 national trial of supplemental oxygen was planned.
Following the belated realization that a seemingly benign intervention like oxygen – a time-honored life-saving ‘drug’ – could have such unexpected, unrecognized and devastating consequences, we realized that almost everything we were doing to care for premature infants was untested. (At mid-century, before the arrival of ventilators and microchemistry, care of marginally viable newborn infants was essentially ‘pastoral’). Like the approach taken by farmers caring for newborn piglets, conditions considered ideal for survival were provided, and it was assumed that those who were ‘meant’ to survive would do so. But none of these purportedly ‘ideal conditions’ had ever been subjected to formal parallel-treatment trials.
A detergent mist was at that time the most recent and widely acclaimed treatment for respiratory distress syndrome – the new name for hyaline membrane disease, then the leading life-threatening disorder seen in premature infants. Since oxygen was used to propel the detergent into the incubators through a nebulizer, it was important to find out whether the effect of this treatment was sufficiently favorable to balance an increased risk of ROP (additionally, the detergent was seeping into the electric motors of the incubators and ruining them). We began a fixed-sample size randomized controlled trial in 1953 to address these uncertainties (Silverman and Andersen 1955). Having decided the size of what we would deem an ‘important difference’ in outcome, we calculated the number of our patients that we would need to enroll. To avoid the temptation of ‘stopping the race when our horse was ahead’, we pledged not to ‘peek’ at the results until the required number of infants had been enrolled in this first formal comparison of detergent-mist versus plain-water-mist. We were unable to detect any differences in the mortality rates over the 5 days following enrolment and the findings at post-mortem examination (autopsy) in infants who died in each of the compared groups.
In 1954, we followed this trial with another one comparing nebulized water-mist versus high humidity (80-90% relative humidity) and, once more, were unable to detect any apparent difference in first-5-day mortality, obstructive respiratory signs (measured by a ‘retraction score’ we devised for this purpose), and in findings at autopsy (Silverman and Andersen 1956).
Finally, at the end of 1954, we undertook the third (in what we thought would be the last) in the series of fixed-sample-size trials of atmospheric conditions. This time we sought to compare high-humidity (80-90% relative humidity) versus moderate humidity (30-60% relative humidity). The latter condition had been maintained in American incubators for almost two decades before mist treatment had been introduced.
In the 1954-55 “humidity” trial we used a fixed-sample-size, factorial design (similar to formats used for many years in agricultural field trials), as suggested by John Fertig, professor of biostatistics in the School of Public Health at Columbia University. This plan was adopted to allow us to carry out a concurrent comparison of two regimens of antibacterial prophylaxis: penicillin/sulfisoxazole (widely-used in the US and prescribed routinely in our unit with no apparent problems for more than a year) versus tetracycline (a newly-available agent which promised to be easier to administer). The results of the study turned out to be deeply disturbing (Silverman et al. 1956). First-5-day mortality was strikingly higher among infants allotted to the arm of the trial treated with the widely accepted agents for prophylaxis (penicillin/sulfisoxazole) compared to the group receiving the proposed replacement (tetracycline). A previously unknown and subsequently demonstrated effect of sulfisoxazole was responsible for the startling and completely unexpected result: the drug displaced albumin-bound bilirubin in the serum of jaundiced neonates with the result that they sustained fatal brain damage from a condition known as kernicterus. Moreover, we realized immediately that the trial would have been stopped sooner if we had ‘peeked’ at the results during the course of what we regarded as a very mundane exercise, instead of waiting until all infants had been enrolled.
This horrific experience convinced us of the need to find a method of ‘controlled peeking’ at the accumulating results in randomized trials that were still recruiting. John Fertig suggested that we consult the statistical group at Columbia who had devised an interesting sampling scheme during the Second World War. The US Navy needed an efficient method to determine the ‘dud’ rate in batches of torpedoes manufactured by various munitions companies. The military approached Columbia’s famous statistician, Abraham Wald, who worked out a sequential design for sampling, which minimized the number of torpedoes that needed to be exploded to obtain a reliable estimate of the proportion of defective bombs in each batch. (Wald died in a tragic plane crash a few years before we consulted his co-workers). We adopted a version of the sequential scheme that allowed continuous, but controlled, oversight of results in all subsequent randomized controlled trials conducted at The Babies Hospital.
In the meantime, we examined the results of the concomitant ‘humidity’ section of the factorial randomized controlled trial and found another startling result (Silverman and Blanc 1957). We did not expect to detect any difference in first-five-day mortality, but found instead that it was lower in infants allotted to the ‘high humidity’ arm of the trial! We were puzzled by this outcome because the respiratory retraction scores, incidence of infection and findings at post-mortem were virtually the same in both groups. In ‘dredging’ through the records, we found a small but consistent decrease in body temperature among infants reared in ‘low humidity’ (30-60% relative humidity).
It seemed very unlikely to us that slightly low body temperature was responsible for an increase in mortality: for more than 20 years, incubator temperatures in America had been intentionally set to maintain relatively low but steady body temperature. This widely accepted practice was based on the findings in a prolonged observational study in Boston in the 1920s (reported in 1933) of the influence of various conditions of the physical environment on the well-being of premature infants. Our unpredicted difference in mortality among babies cared for in the different humidities compared seemed to be a fluke.
Nonetheless, we were now very confident we had a powerful tool to test the ‘temperature hypothesis’ suggested by the associations turned up in the 1953-54 trial. In March 1956, we began a trial comparing first-five-day mortality among infants housed in incubators maintained at two contrasting levels of ambient temperature (31-32º C versus 28-29º C) and one level of humidity (80-90% RH) (Silverman et al. 1958). A matched-pairs sequential plan (devised by John Fertig and Agnes Berger) was used to allow a running analysis of outcome. In February 1957, a pre-determined “decision-line” was crossed, indicating that lower mortality was associated with the warmer incubators.
Six years later these results were confirmed independently in trials conducted in Baltimore and in Pittsburgh. Three separate replications confirming the surprising findings in our 1953-54 trial – small differences in body temperature were associated with measurable differences in mortality – and these findings settled a very old score. Seventeen years before our trial, Dick Day had made some painstaking physiological measurements of thermoregulation in premature infants. His findings – that these babies were truly homeothermic – challenged the widespread practice of caring for newborn infants in slightly cool incubators. But the authorities of the time dismissed his suggestions out of hand, and the everyday custom of maintaining relatively low body temperature in newborn infants continued unchanged for years.
Following these revealing randomized trials, we conducted several more tests of physical environments to tie up some loose ends: a randomized controlled trial comparing two levels of humidity at one body temperature, maintained by a servo-control radiant warmer constructed specifically for this purpose (Silverman et al. 1956); a trial examining the influence of the thermal environment on acid-base homeostasis in the first hours of life of normal neonates (Gandy et al. 1964); a randomized controlled trial examining the effect of the thermal environment on growth and on cold resistance of small infants after the first week of life (Glass et al. 1968); and, finally, in 1967, a trial of the effects of thermal environment and caloric intake on growth after the first week of life (Glass et al. 1969).
I found these early exploitations of the power of randomized controlled trials very exciting, but I was increasingly aware that the statistical approach was anathema to free-wheeling doctors who resented any doubts being expressed about the effectiveness of their untested treatments. (A well-known anesthesiologist at Columbia, tired of our criticism of her use of immersion of newborn babies in ice-water to resuscitate them if they were asphyxiated, finally enrolled in a course on statistics. When she finished, she told Dick Day and me, “Now I know what you guys have been talking about, but I still don’t believe it”.) Most of my colleagues disparaged the numerical approach to clinical problems and their disdain rubbed off on house staff. These doctors-to-be resented being obliged to follow the terms of a study protocol and the discipline of deciding treatment by opening opaque envelopes. Over and over again, in our trials, I saw them hold the sealed envelopes up to a light to try to read the treatment assignments.
I felt it was important for trainees to learn about the new methods, and when the 1953-54 factorial trial was planned I asked for volunteers to help, so they would get hands-on experience. Not one stepped forward. When I wrote up the incredible results of the antibacterial prophylaxis trial, I sent a copy of the draft to Columbia’s world-famous professor of neuropathology (he had read all the slides of brain tissue ‘masked’ as to treatment). He refused to allow his name to appear on the paper as a co-author: “Mere statistics!” he sniffed. When I was invited by a prominent medical school to give a talk, suggesting ‘The Randomized Trial’ as a title: I received a telegram asking “A randomized trial of what?” My hosts found it inconceivable that I would talk for an hour “only about methodology” to assess the effects of our treatments. At research society meetings, randomized controlled trials were also belittled; one of America’s brightest up-coming researchers was very frank, “Bill,” she said, “your trials are so boring!”
In 1969, I saw the destructive results of these condescending attitudes (the mind-set should be called ‘reductionist snobbery’) at a meeting to plan a cooperative study of the proper level of oxygen in the treatment of premature infants. (In The Unresolved Oxygen-Level Issue…. I have described how the anti-trialists delayed a rigorous trial of this important question for 35 years.)(Silverman 2004 b). The inflated reverence for the reductionist approach to highly complex problems (the reductionist, when faced with Newton’s problem to discover the source of gravity, cuts open the apple and looks inside) appeared again in 1977, at a meeting of the prestigious Institute of Medicine. I made a plea for a large scale multicenter randomized controlled trial to settle the long-standing question whether vitamin E prophylaxis could reduce the risk of ROP-blindness. But the Director of the National Institute of Child Health and Human Development won the debate. He told the panel, “Randomized trials have never proved anything; all we need to do is study the chemistry of tocopherol.” The efficacy of this form of prophylaxis remains uncertain to the present day.
The impatient let’s-try-it-and-see approach in the burgeoning field of neonatal medicine has resulted in therapeutic disaster after disaster. It is extremely difficult, I have learned over the years, to convince physicians about the importance of a hard-won modern lesson (the amazing sulfisoxazole incident was the most indelible and instructive example).
Since knowledge in medicine is never complete, the use of concurrent controls in clinical trials of proposed interventions cannot prevent all therapeutic catastrophes. But the precaution can always bring about a substantial reduction in the number of patients maimed and killed as the result of inevitable surprises!
Finally, I am encouraged to see that an editorialist (Wenstrom 2003) has recently emphasized “the critical importance of randomized clinical trials in evaluating new therapies – even heroic procedures performed in only a small fraction of neonates – before they are adopted as part of standard practice.” For example, the willingness of pediatric surgeons to submit a daring fetal intervention (Harrison et al. 2003) to a rigorous parallel-treatment trial is a hopeful sign: perhaps anti-trialist obstruction is, at long last, on the wane.
This James Lind Library commentary has been republished in the Clinical Trials 2004;1:179-84.
Note: These informal comments were written in Greenbrae, California (at the request of my friend Iain Chalmers) in October 2003, on the occasion of my 86th birthday (10/23/03).
Gandy GM, Adamsons K, Cunningham N, Silverman WA, James SL (1964). Thermal environment and acid-base homeostasis in human infants during the first few hours of life. J Clin Invest 43:751-758.
Glass L, Silverman WA, Sinclair JC (1968). Effect of the thermal environment on cold resistance and growth of small infants after the first week of life. Pediatrics 41:1033-1046.
Glass L, Silverman WA, Sinclair JC (1969). Relationship of thermal environment and caloric intake to growth and resting metabolism in the late neonatal period. Biol Neonat 14:324-340.
Harrison MR, Keller RL, Hawgood SB, Kitterman JA, Sandberg PL, Farmer DL, Lee H, Filly RA, Farrell JA, Albanese CT (2003). A randomized trial of fetal endoscopic tracheal occlusion for severe fetal congenital diaphragmatic hernia. New Eng J Med 349:1916-1924.
Locke JC, Reese AB (1952). Retrolental fibroplasia. The negative role of light mydriatics and the ophthalmoscope examination in its etiology. Arch Ophth 48:44-47.
Reese AB, Blodi FC, Locke JC, Silverman WA, Day RL (1952). Results of the use of corticotropin (ACTH) in treatment of retrolental fibroplasia. AMA Arch Ophth 47:557-555.
Silverman WA (2004 a). A cautionary tale about supplemental oxygen: the albatross of neonatal medicine. Pediatrics 113:394-396.
Silverman WA (2004 b). Commentary: The unresolved oxygen-level issue – hijacked by anti-trialists. J Perinat 24:109-111.
Silverman WA, Andersen DH (1955). Controlled clinical trial of effects of Alevaire mist on premature infants. JAMA 157:1093-1096.
Silverman WA, Andersen DH (1956). A controlled clinical trial of effects of water mist on obstructive respiratory signs, death rate, and necropsy findings among premature infants. Pediatrics 17:1-10.
Silverman WA, Blanc WA (1957). The effect of humidity on the survival of newly born infants. Pediatrics 20:477-486.
Silverman WA, Andersen DH, Blanc WA, Crozier DN (1956). A difference in mortality rate and incidence of kernicterus among premature infants allotted to two prophylactic regimens. Pediatrics 18:614-625.
Silverman WA, Fertig JW, Berger AP (1958). The influence of the thermal environment upon the survival of newly born premature infants. Pediatrics 22:876-885.
Silverman WA, Agate FJ, Fertig JW (1963). A sequential trial of the non-thermal effect of atmospheric humidity on survival of the newborn infant of low birthweight. Pediatrics 31:719-724.
Wenstrom KD (2003). Fetal surgery for congenital diaphragmatic hernia. New Eng J Med 349:1887-1888.