### 1 Introduction

Jerome Cornfield (1912-1979) was a man of philosophical bent, engaging wit, deep thought, great mathematical talent, and formidable skill in written and oral debate (see, for example, Cornfield 1954; 1959; 1970c; 1971; 1975; 1976; 1978). He had extensive influence on biostatistics and medical research in the USA in the middle of the 20th century.

Cornfield was never awarded a university degree beyond a Bachelor of Science. Indeed, in the words of one distinguished colleague, “he represented a mockery of excessive adherence to traditional qualifications” (Frederickson 1982). Nevertheless, Cornfield was elected president of the American Statistical Association, the American Epidemiologic Society, and the International Biometric Society’s Eastern North American Region, and he was a fellow of the Institute of Mathematical Statistics, the American Statistical Association, and the American Association for the Advancement of Science (Greenhouse and Halperin 1980; Greenhouse 1982a).

While at the National Cancer Institute in the 1950s, Cornfield developed statistical methods for laboratory studies and epidemiologic investigations. His writing on the nature of causation and on evidence that may be used to buttress an inference of cause-and-effect arose from his involvement in one of the great public health controversies of his time: lung cancer in relation to cigarette smoking (Cornfield 1954; Cornfield et al. 1959).

Among his many contributions to epidemiology, Cornfield defended case-control studies as an appropriate method to assess the potential effects of an exposure on the risk of disease; developed the odds ratio, based on a case-control study, as an approximation to the corresponding estimate of relative risk based on a cohort study; and developed the rationale for the use of relative risk, as opposed to absolute risk, in studies of disease etiology. He also gave a persuasive rationale for the use of observational studies in scientific inference (Cornfield 1959).

Shortly after his death, the scope of Cornfield’s wide-ranging research activities was surveyed in a series of articles written by persons who knew him well. They discussed his contributions to laboratory research (Mantel 1982), epidemiologic studies (Greenhouse 1982b), statistical theory (Zelen 1982), and clinical trials (Ederer 1982; also see Green 1997).

### 2 Cornfield and clinical trials

“*Despite the ambiguities involved in design, in decision making and in conclusion reaching, it is undeniable that the clinical trial has constituted an important contribution to medicine. … [this] surely can be explained as simply a further triumph of experimental method as applied to clinical medicine. As such, it would not have surprised even Claude Bernard, and only the statistical participation would have puzzled him.”* (Cornfield 1976, p 420)

The last major expression of Cornfield’s thinking about clinical trials appeared in his 1976 article, *Recent Methodological Contributions to Clinical Trials*. It covered six areas: decision making, appraisal of uncertainty, likelihood ratios, the use of prior opinion in statistical analyses (Bayesian methods), patient subgroups (multiple comparisons), and randomization. Cornfield began his discussion with three broad assertions: statistical methods could never provide unique, unequivocal answers to problems of design and analysis; the process of inference and decision making in clinical trials is loosely structured, because that is the nature of an intrinsically complex enterprise; and analyses of data from clinical trials may inevitably lead to ambiguous answers.

Cornfield’s skepticism about the role of statistical methods in decision making and scientific inference had been reinforced by his recent experience with the interim monitoring of several large-scale, multi-centre clinical trials. His reservations, however, had been voiced much earlier in a more general context in 1959, in his carefully reasoned, thought provoking overview of research methodology: *Principles of Research* (Cornfield 1959).

Cornfield had long advocated that clinical trials be randomized whenever feasible and ethical. He also urged that *individuals* be randomized if possible, rather than larger units, such as clinics or hospitals. He had stated the frequentist rationale for randomization in 1959:

“*The device of randomization did two things. First, it controlled the probability that the treated and the control group differed by more than a calculable amount in their exposure to disease, in their immune history, or with respect to any other variable, known or unknown to the experimenters, which might have a bearing on the outcome of their trial. Furthermore, as the size of the two groups being compared increased, it assured that the probability that they differ by more than this amount approached zero.”*

*“The second thing that randomization made possible was an objective answer to the question that must be asked at the conclusion of any trial: In how many experiments could a difference of this magnitude have arisen by chance alone if the treatment truly has no effect?”* (Cornfield 1959, p 245)

In his 1976 paper, Cornfield wrote:

*“One of the finest fruits of the Fisherian revolution was the idea of randomization, and statisticians who agree on few other things have at least agreed on this. But despite this agreement and despite the widespread use of randomized allocation procedures in clinical and in other forms of experimentation, its logical status, i.e. the exact function it performs, is still obscure. Does it provide the only basis by which a valid comparison can be achieved or is it simply an ad hoc device to achieve comparability between treatment groups?”* (Cornfield 1976, p 418)

Although randomization’s ‘logical status’ and ‘the exact function it performs’ may have become obscure upon deep reflection, Cornfield nonetheless answered both of the questions posed: ‘no’, to the first, and essentially ‘yes,’ to the second. However, although Cornfield was a long-standing advocate of randomized clinical trials, he believed that observational studies could also provide a secure foundation for scientific inference and decisions. His defense of observational studies in 1959, accompanied by his earlier writing on this topic, are reminiscent of Bradford Hill’s 1953 article, *Observation and Experiment*, and Hill’s celebrated exposition of guidelines for assessing cause-and-effect from epidemiologic studies (Hill 1953; 1965).

### 3 Decision making and appraisal of uncertainty in monitoring clinical trials

Cornfield’s early work in statistics reflected his training in frequentist methods, for which ‘probability’ is a measure of the *relative frequency* with which events occur in the natural world. This interpretation differs from that of Bayesians, for whom ‘probability’ is a measure of a person’s *degree of belief* in a proposition, such as ‘the risk of lung cancer is increased 20-fold in persons who smoke 2 packs of cigarettes per day’. Both conceptions of probability have a long history of use and application (Fienberg 1992, 2006), and both conform to the same axioms and theorems of probability theory. In addition to what is meant by *probability*, the major difference in the Bayesian approach is that it imposes two requirements: first, one must specify a probability distribution (of belief) for values of the parameters involved in a statistical model to be used for an analysis; and second, when the study data become available, one must apply Bayes’ theorem to update the prior probability distribution. This second requirement, called *Bayes* *rule*, is a mathematical consequence of Bayesians’ insistence that any rational expression of degree of belief (personal probability) must be internally consistent (Lindley 2006, pp 36-37 & 64-66).

In the mid-1960s Cornfield repudiated frequentist methods for monitoring clinical trials and expressed his conversion to a then radical point of view, namely, that tests of statistical significance and p-values, due to R.A. Fisher (1925), tests of statistical hypotheses and confidence intervals, due to Jerzy Neyman and Egon Pearson (1933a, 1933b), and sequential analysis, due to Abraham Wald and George Barnard (Wald 1945; Barnard 1946) – virtually everything that frequentists held dear – were seriously misguided (Cornfield 1966a; 1966b; 1969; 1970b; Cornfield and Greenhouse 1967). Such thoughts had essentially been stated earlier by Jimmy Savage among others (Savage 1961; Edwards et al. 1963), but it was Francis Anscombe’s review of Peter Armitage’s influential text, *Sequential Medical Trials*, which really garnered attention (Armitage 1960; Anscombe 1963).

Anscombe’s carefully reasoned, trenchant criticism of sequential analysis and Neyman-Pearson theory of hypothesis tests was so important that senior statisticians at the US National Institutes of Health held ‘an informal seminar’ about the issue in June 1965 (Cutler et al. 1966). On that occasion Cornfield argued for a Bayesian approach to monitoring clinical trials. Although he subsequently continued to use the frequentist concepts of ‘power’ and ‘level of statistical significance’ to plan clinical trials (Cornfield 1970a), Cornfield objected to their relevance in the interim analyses of a study’s emerging data, and by implication in the final analysis as well.

### 4 Cornfield and the University Group Diabetes Program (UGDP)

### 4.1 The UGDP saga

*“The UGDP was meant to be a model of clinical investigation. Planned and managed by a team of statisticians and clinicians, the study addressed a series of long-standing controversies over the clinical management of diabetes. Intended to demonstrate how a properly designed, randomized, controlled trial could resolve differences of clinical opinion, the UGDP instead became a symbol of all that was wrong with the statistical enterprise in medicine. Few recent controversies in medicine are comparable in length and rancor to that over the UGDP.”* (Marks 1997, p 198)

In his 1976 article, Cornfield referred to three clinical trials as examples of decisions that had to be made in the face of major, unexpected problems: the University Group Diabetes Program (1970a; 1970b), the Coronary Drug Project (1970; 1973), and the Diabetic Retinopathy Study (1976).

Because of the importance of the UGDP to the development of modern-day clinical trials (Meinert 2012; Marks 1997; Greene 2007), I use that study to explain Cornfield’s Bayesian method – relative betting odds – which was used as one of three statistical techniques to support the decision to discontinue tolbutamide. The UGDP also provides an instructive instance in which Cornfield’s formidable skills in rhetoric, statistical analysis, and logical thinking are clearly displayed in undermining the arguments of a study’s critics (Cornfield 1971). It additionally provides a striking example in which Cornfield’s Bayesian rationale for avoiding the use of p values and tests of statistical significance were contradicted by his practice.

Cornfield was not involved in the UGDP’s design, and he was not part of the research team initially. Nearly 6 years after the first patient had been enrolled and the study was fully in progress, his advice was sought by Christian Klimt, the head of the UGDP’s Statistical Center. At that time, Klimt was vexed by a major problem: informal, interim analyses indicated that more cardiovascular deaths were occurring in patients treated with tolbutamide than with placebo – a completely unexpected, inexplicable finding. Bayesian analyses by Cornfield and computer simulations for frequentist-based analyses were therefore developed to assess the unfavorable trend, which also indicated that overall mortality on tolbutamide was no better than that on placebo and was possibly worse. On the basis of these analyses, Cornfield advocated that treatment with tolbutamide be stopped, and he remained the major defender of the statistical basis of that recommendation (Cornfield 1971, 1974a, 1974b).

With Bayesian and frequentist analyses in hand, a two-day meeting of the UGDP’s Executive Steering Committee voted in June 1969 to drop tolbutamide as a study treatment, and duly informed the US Food and Drug Administration (FDA) of their decision. Despite skepticism by some reviewers in the agency and by its external advisors, the US Food and Drug Administration announced its intention in May 1970 to place a warning of increased cardiovascular hazard on the label for tolbutamide and all chemically-related agents (sulfonylurea drugs). This was done before any UGDP publication, and it set in motion a long-ensuing clinical and scientific controversy.

Within a few months of the UGDP’s first two publications (UGDP 1970a; 1970b), several articles highly critical of the study’s design, implementation, and statistical analyses appeared in the medical literature (Feinstein 1971; Leibel 1971; Salsburg 1971; Schor 1971). Those articles were followed shortly thereafter by further criticism of the UGDP (Seltzer 1972; Schor 1973), which continued throughout the 1970s (Feinstein 1976a; 1976b; 1979).

Cornfield was a representative of the UGDP in discussions with the US Food and Drug Administration, and he testified before a US Senate subcommittee in a hearing held over 3 days in September 1974 about the controversy involved (Cornfield 1974a; Subcommittee on Monopoly 1974). Amazingly, on the last day of that hearing, Cornfield was allowed to cross examine an articulate, forceful opponent of the UGDP, Holbrooke Seltzer, who was Professor of Internal Medicine at the University of Texas and a member of the Board of Directors of the American Diabetes Association (Seltzer 1974; Cornfield 1974b). Furthermore, at the request of Senator Gaylord Nelson, Chairman of the Subcommittee on Monopoly, Cornfield wrote replies to numerous interrogatories proffered by the attorney representing a group of diabetologists, including Seltzer, who opposed the actions of the US Food and Drug Administration, and challenged the validity of the UGDP’s findings (Cornfield 1974c). Those challenges, which unsuccessfully sought to obtain the UGDP data, were ultimately decided by the US Supreme Court (1980).

### 4.2 The application of Cornfield’s ‘relative betting odds’

Although the UGDP was planned and monitored with frequentist methods, it became the first clinical trial in which a Bayesian analysis was used, what Cornfield called ‘relative betting odds’, which is a Bayesian method for testing hypotheses (Cornfield 1966a, 1969; also see Edwards et al. 1963). The term ‘betting odds’ refers to an index of *personal belief* in a proposition or hypothesis (H), such as the following:

H: ‘the risk of cardiovascular death on tolbutamide is identical to that on placebo’.

The strength of one’s belief that H is true is explained by way of placing a bet. For someone to give odds of 10:1 in favor of H, before the UGDP data have been inspected, i.e. ‘prior odds’ that H is true, also called ‘prior odds of H’, means that one is prepared to bet 10 units that H is true in order to receive only 1 unit if H is shown to be false.

In Cornfield’s application to the UGDP, betting odds on H were often expressed relative to a specific alternative hypothesis (H_{A}),

H_{A}: ‘compared to placebo, the risk of cardiovascular death on tolbutamide is increased by 25%’.

The figure below shows that as data from the UGDP emerged, they began to indicate that H is false.

Figure 1. Cumulative mortality rates per 100 persons at risk, by year of follow-up (UGDP 1970b). TOLB tolbutamide; PLBO: placebo.

In the 204 patients on tolbutamide, there were 30 deaths in total by the 8th year of follow-up, 26 of which were attributed to cardiovascular causes. By comparison, there were only 21 deaths in the 205 patients on placebo, 10 of which were attributed to cardiovascular causes. The life-table estimate of the cumulative risk of cardiovascular death (intent-to-treat analysis) was 17.6 (standard error 2.5) per 100 on tolbutamide vs. 6.0 (standard error 2.5) per 100 on placebo.

In view of such data, a person’s ‘posterior odds’ in favor of H being true, i.e. one’s belief that ‘the risk of cardiovascular death on tolbutamide is identical to that on placebo’, should be smaller than his ‘prior odds’. The amount of decrease is given by Bayes theorem. That amount is what Cornfield called the ‘relative betting odds’. It expresses the degree to which the acquisition of data should change one’s ‘prior odds’ to ‘posterior odds’. Expressed schematically (Goodman 1999):

posterior odds of H = relative betting odds × prior odds of H.

Values of relative betting odds less than 1 should diminish one’s posterior odds that H is true, whereas values of relative betting odds greater than 1 should enlarge them.

The tabulation below displays some of the UGDP’s many reported values of relative betting odds (UGDP 1970b). The alternative hypothesis (H_{A}) in the tabulation is that ‘the cumulative risk of cardiovascular death is increased by 25% over the cumulative risk on placebo’.

Figure 2. Relative betting odds (RBO) for the difference in cumulative cardiovascular mortality (tolbutamide – placebo), by year of follow-up.

One sees from the tabulation that the UGDP’s accumulating data led to progressively decreasing values of the relative betting odds. By the 8th year of follow-up, the relative betting odds were 0.15. In other words, at that time a person’s prior odds in favor of H, i.e., that ‘the risk of cardiovascular death on tolbutamide is identical to that on placebo’, should have been diminished by 85%.

Relative betting odds depend not only on the data, but also on the degree of belief in the alternative hypotheses against which the null hypothesis is tested (see, for example, Cornfield 1966, p 580; Urokinase Pulmonary Embolism Trial 1973, p II-27; Jennison and Turnbull 1999, p 340). Further complications arise in calculating relative betting odds for ‘composite hypotheses’, such as ‘the risk of cardiovascular death on tolbutamide is within 5% of the risk on placebo’, or ‘the risk of cardiovascular death on tolbutamide is at least 25% greater than the risk on placebo’ (see, for example, Edwards et al. 1963; Cornfield 1969).

### 4.3 Cornfield’s defense of the UGDP

“*… an investigation originally designed to produce new knowledge, suddenly found itself involved in a difficult and unwanted task of decision making. From the purely formal hypothesis testing point of view that dominated the early thinking in clinical trials, what had happened was that the same body of data had been used to formulate a hypothesis and to test it. From that point of view the University Group Diabetes Program results should have been treated as suggesting a hypothesis to be tested in a new and independent trial. But to the investigators this was inappropriate. They had to decide for themselves and their patients whether the evidence available to them justified the future exposure of anyone to these agents … ”* (Cornfield 1976, p 409)

At the outset of his rebuttal of the UGDP’s critics, Cornfield stated that the study’s prudent decision and moderately worded conclusion had been:

*“…received by some critics with a hostility which has no discernible scientific basis. … The subsequent analysis is undertaken to illuminate these alternatives [i.e., independent repetition of the UGDP vs. acceptance of its findings] and not to defend the UGDP. Its concentration on the strength of the evidence against tolbutamide should of course not be permitted to obscure the more general UGDP finding that lowering of blood glucose level did not appreciably lower the eight-year mortality from cardiovascular disease as compared with patients on diet alone.”* (Cornfield 1971, p 1676)

Cornfield’s claim that his purpose was “not to defend the UGDP” is at odds with what he did, namely, rebut every statistical criticism that had been leveled at the study. Furthermore, his remark that “the more general UGDP finding that lowering of blood glucose level did not appreciably lower the eight-year mortality from cardiovascular disease as compared with patients on diet alone” was based on a serendipitous result: the UGDP was not designed to address this issue. Nonetheless, by insightful analysis using standard statistical methods and ruthless logic, Cornfield demonstrated that no factual basis supported the critics’ concerns that randomization had somehow broken down and produced major baseline inequalities which nullified the results; that excess mortality on tolbutamide was confined to a small number of clinics, and because of this the UGDP’s findings could not be generalized to medical practice; and that dropouts, non-adherence to treatment, and the lower-than-expected mortality in the placebo group undermined the UGDP’s findings concerning tolbutamide.

Cornfield also rebutted a number of clinical concerns, including those expressed about the eligibility criteria for study patients, the use of a fixed dose of tolbutamide, the determination of the principal cause of death, the definitions of baseline risk factors, the failure to obtain data on patients’ smoking histories, and the decision to stop treatment with tolbutamide before a more conclusive demonstration was available for its apparently harmful effect.

Despite Cornfield’s advocacy of Bayesian methods in clinical trials and the use of his relative betting odds to support the UGDP’s decision to terminate treatment with tolbutamide, his rebuttal relied extensively on the very frequentist methods that he was arguing against: p-values and tests of statistical significance as a basis of support for or against hypotheses. Bayesian methods were conspicuous by their absence.

What can one say about the discrepancy between Cornfield’s preaching against p-values and his practice? One answer is that he chose to use well-accepted frequentist methodology in defending the UGDP because it suited his purpose, was not misleading, and because framing a rebuttal through a series of Bayesian analyses would have been incomprehensible to and likely rejected by the vast majority of readers. A related answer is that a Bayesian rebuttal would have required Cornfield to specify prior distributions of belief in the issues he was discussing. One specification could have reflected his personal beliefs, or that of others defending the UGDP, but such prior distributions had never been expressed before the UGDP data were examined. Another tactic could have involved the application of a ‘neutral prior’, that is, a prior distribution of belief that represented weakly-held opinion about the issues involved. This would have led to posterior distributions that approximate the likelihood function (Royall 1997), with results similar to those based on p-values. None of the analyses, however, would have satisfied the UDGP’s fiercest critics, who were alleging that the study was a seriously flawed investigation beyond the remediation of statistics. The only choice of prior distribution that would have satisfied them would have been one that led to accepting their position.

As emphasized by Lindley (2000, p 313), Bayesian methods are deficient in situations involving conflict, a circumstance that embroiled the UGDP both internally (some of the study investigators opposed the decision to stop treatment with tolbutamide) and externally (for various reasons many clinicians not involved in the UGDP did not believe its results). Although the UGDP was challenged by some critics because the study had stopped treatment with tolbutamide “too soon”, i.e. before more data on mortality were at hand, it is doubtful in retrospect whether more data from the same study would have ever settled the controversy.

### 5 Cornfield’s legacy

“*It is sometimes claimed, or at least implied, by frequentists that, in contrast to Bayesians, they seek rules which minimize the long-run frequency of errors of inference or decision. If Bayesian procedures really lacked this property, then it would be difficult to accept them or to make them the basis of any scientifically defensible system of data analysis or behavior. We shall here argue that no such contradiction exists*” (Cornfield 1970b, p 15).

Cornfield’s retirement from the US Civil Service in 1967 left the NIH with no influential advocate of Bayesian theory, which had been subject to some highly contentious argument (see, for example, Savage 1961; Hartley 1963; Kempthorne 1969; Bross 1969; Lindley 1975, 1986; Chernoff 1986). Only two other clinical trials, the Coronary Drug Project (1970, 1973, 1980) and the Urokinase Pulmonary Embolism Trial (1970, 1973), used Cornfield’s relative betting odds methodology, which is now largely abandoned for interim monitoring (Jennison and Turnbull 1999, p 340). Bayesian methods currently use the posterior distribution of the parameter of interest, such as the relative risk or the difference in risk. To limit the number of different estimates that can arise from a given set of data, some recommend that Bayesian calculations be made under three broadly different assumptions: a neutral prior, a skeptical prior (against the alternative hypothesis of interest), and a prior that favors the alternative hypothesis (Spiegelhalter et al. 1994, 2004), which is similar to a proposal made by Cornfield and Greenhouse (1967, pp 823-825).

Only a few of Cornfield’s articles on clinical trials are now cited in articles and texts. His evocative phrase ‘relative betting odds’, which so effectively conjures the image of a casino, has been supplanted sadly by the nondescript ‘Bayes factor’, a term coined by IJ Good (Good 1958; 1967; 1999). Despite Cornfield’s key contribution to the interpretation of the UGDP, the investigators did not use the term ‘relative betting odds’ in their publications, perhaps because of its connotation and the controversy in which they were involved. The UGDP, however, did use ‘RBO’, which was said to be an abbreviation for ‘likelihood ratio’, and they cited Cornfield’s article (Cornfield 1969), but without mentioning that RBO was his acronym for ‘relative betting odds’.

Even if one subscribes to the use of Bayes’ rule for modifying prior belief, in practical applications such as the UGDP, making decisions and taking action will involve additional considerations: assessing the design of a study vs. its implementation, and weighing costs, benefits, and ethical concerns, all of which can be subject to widely differing opinions (Armitage 2013). With regard to ethics, Anscombe’s proposed Bayesian method to decide whether to continue recruitment to a clinical trial when accumulating data suggest that one of the treatments being compared is superior (Anscombe 1963) failed to account for physicians’ patient-centered perspective (Armitage 1963; 2013), a viewpoint which Cornfield evidently shared (see quotation at the start of section 4.3).

Cornfield became a staunch advocate of Bayesian methods, but he was never an ideologue on their behalf. His thinking and writing about “the Bayesian outlook” from 1966 forward was accompanied by his contemporaneous use of frequentist techniques, p-values among them (see Cornfield 1971), which he claimed to deplore in principle. The resultant inconsistency between this theorizing and practice of a supremely logical man might be explained by Cornfield being not only a scientific pragmatist, i.e., using methods that were accepted by the large majority clinicians and scientists, but also a skeptic, which was expressed in his 1976 article, *Recent Methodological Contributions to Clinical Trials*: there can never be a completely reliable foundation of scientific inference or decision-making in the face of uncertainty, and statistical methods alone, Bayesian or otherwise, are incapable of solving this problem (Armitage 2013).

If the above remarks are true, then why did Cornfield convert to and espouse Bayesian methods? One answer, offered by Cornfield, is that frequentist methods not only lead to logical conundrums, but they are also inflexible in practice, especially in the very instances where flexibility is scientifically most needed (see, for example, Cutler et al. 1966, pp 862-866 & 877-881; Cornfield 1966b, 1969, 1970b). In adopting that position, he was apparently willing to overlook problems with Bayesian theory (see, for example, Efron 1986; Royall 1997, pp 167-176), arguing that the advantage to be gained by improved flexibility was not accompanied by any important loss from abandoning frequentist techniques.

Cornfield’s paper on recent methodological contributions to clinical trials was presented on 30 April 1976 at the Reed-Frost Symposium of the Johns Hopkins University. At the time Cornfield spoke, he did not realize that he would soon cross a threshold and pass into history. One year earlier, however, he had mused about the future:

“*…what is statistics and where is it going? When one tries, spider-like, to spin such a thread out of his viscera, he must say, first, according to Dean Acheson, “What do I know, or think I know, from my own experience and not by literary osmosis?” An honest answer would be, “Not much; and I am not too sure of most of it*” (Cornfield 1975).

While Cornfield’s contributions to clinical trials may be nearly forgotten, the Bayesian perspective that he advocated is becoming more widely accepted (see, for example, Spiegelhalter et al. 2004; Berry 2006; FDA 2010), a development that would have pleased him.

**Acknowledgement**

Iain Chalmers, Curtis Meinert, Alfredo Morabia, and Jan Vandenbroucke offered wise advice and insightful critiques of an earlier draft of this commentary. I am responsible for any errors of fact or interpretation that may remain.

This James Lind Library article has been republished in the *Journal of the Royal Society of Medicine* 2016;109:27-35 Print PDF

### References

Anscombe FJ (1963). Sequential medical trials. Journal of the American Statistical Association 58:365-384.

Armitage P (1960). Sequential medical trials. Oxford: Blackwell.

Armitage P (1963). Sequential medical trials: some comments on F.J. Anscombe’s paper. Journal of the American Statistical Association 58:384-387.

Armitage P (2013). The evolution of ways of deciding when clinical trials should stop recruiting. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/the-evolution-of-ways-of-deciding-when-clinical-trials-should-stop-recruiting/).

Barnard GA (1946). Sequential tests in industrial statistics. Supplement to the Journal of the Royal Statistical Society 8:1-21.

Berry DA (2006). Bayesian clinical trials. Nature Reviews. Drug Discovery. 5(1):27-36.

Bross IDJ (1969). Applications of probability: Science vs. pseudoscience. Journal of the American Statistical Association 64:51-57.

Chernoff H (1986). Comment on “Why isn’t everyone a Bayesian?” The American Statistician 40(1):5-6.

Cornfield J (1954). Statistical relationships and proof in medicine. The American Statistician 8(5):19-21.

Cornfield J (1959). Principles of research. American Journal of Mental Deficiency 64:240-252.

Cornfield J (1966a). A Bayesian test of some classical hypotheses, with applications to sequential clinical trials. Journal of the American Statistical Association 61:577-594.

Cornfield J (1966b). Sequential trials, sequential analysis and the likelihood principle. The American Statistician 20:18-23.

Cornfield J (1969). The Bayesian outlook and its applications (with discussion). Biometrics 25:617-657.

Cornfield J (1970a). Fixed and floating sample size trials. In Symposium on Statistical Aspects of Protocol Design. Engle RL, Jr. (Symposium Chairman). Bethesda, Maryland: Clinical Investigation Review Committee, Clinical Investigations Branch, National Cancer Institute, National Institutes of Health, pp 181-187, with discussion on pp 197-204.

Cornfield J (1970b). The frequency theory of probability, Bayes’ theorem, and sequential clinical trials. In Bayesian Statistics (eds. Donald L. Meyer, Raymond O. Collier, Jr.) Itasca, Illinois: Peacock Publishers Inc., pp 1-28.

Cornfield J (1970c). Discussion by J. Cornfield, B.M. Jill, D.V. Lindley, S. Geisser, and C.M. Mallows. In Bayesian Statistics (eds. Donald L. Meyer, Raymond O. Collier, Jr.) Itasca, Illinois: Peacock Publishers Inc., pp 85-125.

Cornfield J (1971). The University Group Diabetes Program. A further statistical analysis of the mortality findings. Journal of the American Medical Association 217:1676-1687.

Cornfield J (1974a). Statement of Dr. Jerome Cornfield, Chairman, Department of Statistics, The George Washington University, Washington, D.C. In Subcommittee on Monopoly (1974), pp 10778-10794.

Cornfield J (1974b). Interrogation of Holbrooke S. Seltzer, M.D. In Subcommittee on Monopoly (1974), pp 10889-10895.

Cornfield J (1974c). Correspondence between Senator Gaylord Nelson and Neil L. Chayet, Dr. Jerome Cornfield, Dr. Christian R. Klimt, and Dr. Jeremiah Stamler. In Subcommittee on Monopoly (1974), pp 11507-11523.

Cornfield J (1975). A statistician’s apology. Journal of the American Statistical Association 70:7-14.

Cornfield J (1976). Recent methodological contributions to clinical trials. American Journal of Epidemiology 104:408-421.

Cornfield J (1978). Randomization by group: a formal analysis. American Journal of Epidemiology 108:100-102.

Cornfield J, Greenhouse SW (1967). On certain aspects of sequential clinical trials. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 4. (eds. Jerzy Neyman, Lucien M. LeCam) Berkeley, California: University of California Press, pp. 813-829.

Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL (1959). Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22:173-203.

Coronary Drug Project Research Group (1970). The Coronary Drug Project: Initial findings leading to modifications of its research protocol. JAMA 214(7):1303-1313.

Coronary Drug Project Research Group (1973). The Coronary Drug Project: Findings Leading to Discontinuation of the 2.5-mg/day Estrogen Group. JAMA 226(6):652-657.

Coronary Drug Project Research Group (1980). Influence of adherence to treatment and response of cholesterol on mortality in the coronary drug project. New England Journal of Medicine 303:1038-41.

Cutler SJ, Greenhouse SW, Cornfield J, Schneiderman MA (1966). The role of hypothesis testing in clinical trials. Journal of Chronic Diseases 19:857-882.

Ederer F (1982). Jerome Cornfield’s contributions to the conduct of clinical trials. Biometrics 38(Suppl):25-32.

Edwards W, Lindman H, Savage LJ (1963). Bayesian statistical inference for psychological research. Psychological Reviews 70:193-242.

Efron B (1986). Why isn’t everyone a Bayesian? The American Statistician 40(1):1-5.

The Diabetic Retinopathy Study Research Group (1976). Preliminary report on effects of photocoagulation therapy. American Journal of Ophthalmology 81(4):383-396.

FDA (2010). Guidance for Industry and FDA Staff. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials (https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials accessed 21/8/2019.

Feinstein AR (1971). Clinical biostatistics VIII. An analytic appraisal of the University Group Diabetes Program (UGDP) study. Clinical Pharmacology and Therapeutics 12:167-191.

Feinstein AR (1976a). Clinical biostatistics XXXV. The persistent clinical failures and fallacies of the UGDP study. Clinical Pharmacology and Therapeutics 19:78-93.

Feinstein AR (1976b). Clinical biostatistics XXXVI. The persistent biometric problems of the UGDP study. Clinical Pharmacology and Therapeutics 19:742-785.

Feinstein AR (1979). How good is the statistical evidence against oral hypoglycemic agents? Advances in Internal Medicine 24:71-95.

Fienberg SE (1992). A brief history of statistics in three and one-half chapters: a review essay. Statistical Science 7(2): 208-225.

Fienberg SE (2006). When did Bayesian inference become “Bayesian”? Bayesian Analysis 1:1-40.

Fisher RA (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.

Frederickson DS (1982). Remarks. Biometrics 38(Suppl):7.

Good IJ (1958). Significance tests in parallel and in series. Journal of the American Statistical Association 53:799-813.

Good IJ (1967). A Bayesian significance test for multinomial distributions. Journal of the Royal Statistical Society, Series B 29:399-431.

Good IJ (1999). Letter re Bayes factors: what they are and what they are not. The American Statistician 55:173-174.

Goodman SN (1999). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130:1005-1013.

Green SB (1997). A conversation with Fred Ederer. Statistical Science 12:125-31.

Greene JA (2007). Prescribing by Numbers. Baltimore:The Johns Hopkins University Press.

Greenhouse SW (1982a). A Tribute. Biometrics 38(Suppl):3-6.

Greenhouse SW (1982b). Jerome Cornfield’s contributions to epidemiology. Biometrics 38(Suppl):33-45.

Greenhouse SW, Halperin M (1980). Jerome Cornfield (1912-1979). The American Statistician 34(2):106-107.

Hartley HO (1963). In Dr. Bayes’ consulting room. The American Statistician 17(1):22-24.

Hill AB (1953). Observation and experiment. New England Journal of Medicine 248:995-1001.

Hill AB (1965). The environment and disease: association or causation? Proceedings of the Royal Society of Medicine 58:295-300.

Jennison C, Turnbull BW (1999). Group sequential methods with applications to clinical trials. New York: Chapman & Hall / CRC.

Kempthorne O (1969). Discussion of “The Bayesian outlook and its application.” Biometrics 25:647-654.

Leibel B (1971). An analysis of the University Group Diabetes Study Program: data results and conclusions. Canadian Medical Association Journal 105:292-294.

Lindley DV (1975). The future of statistics: A Bayesian 21st century. Advances in Applied Probability 7(Suppl):106-115.

Lindley DV (1986). Comment on “Why isn’t everyone a Bayesian?” The American Statistician 40(1):6-7.

Lindley DV (2000). The philosophy of statistics. The Statistician 49:293-337.

Lindley DV (2006). Understanding Uncertainty. New York: Wiley.

Mantel N (1982). Jerome Cornfield and statistical applications to laboratory research: a personal reminiscence. Biometrics 38(Suppl):17-23.

Marks HM (1997). The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900-1990. Edinburgh: Cambridge University Press.

Meinert CL (2012). Clinical Trials: Design, Conduct, and Analysis, 2nd ed. New York: Oxford University Press.

Neyman J, Pearson ES (1933a). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Series A 231:289-337.

Neyman J, Pearson ES (1933b). The testing of statistical hypotheses in relation to probabilities a priori. Proceedings of the Cambridge Philosophical Society 29:492-510.

Royall R (1997). Statistical Evidence: A Likelihood Paradigm. New York: Chapman & Hall, pp 169-176.

Salsburg DS (1971). The UGDP study. Journal of the American Medical Association 218:1704-1705.

Savage LJ (1961). The foundations of statistics reconsidered. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol 1. (ed. Jerzy Neyman) Berkeley: University of California Press, pp. 575-586.

Schor S (1971). The University Group Diabetes Program. A statistician looks at the mortality results. Journal of the American Medical Association 217:1671-1675.

Schor SS (1973). Statistical problems in clinical trials: the UGDP study revisited. The American Journal of Medicine 55:727-732.

Seltzer HS (1972). A summary of criticisms of the findings and conclusions of the University Group Diabetes Program (UGDP). Diabetes 21:976-979.

Seltzer HS (1974). Statement of Holbrooke S. Seltzer, M.D., Chief of Metabolism at the Veterans’ Administration Hospital and Professor of Internal Medicine at The University of Texas, Dallas, Tex. In Subcommittee on Monopoly (1974), pp 10880-10889.

Spiegelhalter DJ, Abrams KR, Myles JP (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. New York, Wiley.

Spiegelhalter DJ, Freedman LS, Parmar MKB (1994). Bayesian approaches to randomized trials. Journal of the Royal Statistical Society, Series A 157:357-416.

Subcommittee on Monopoly (1974). Hearings on the Present Status of Competition in the PharmaceuticaI lndustry, Second Session, Part 25, Oral Hypoglycemic Drugs. Select Committee on Small Business, United States Senate: September 18, 19, and 20, 1974 (http://babel.hathitrust.org/cgi/pt?id=mdp.39015005125193, accessed 3/26/2015).

US Supreme Court (1980). Forsham v. Harris, 445 U.S. 169 (1980). Forsham v. Harris No. 78-1118. Argued October 31, 1979, Decided March 3, 1980, 445 U.S. 169 (http://supreme.justia.com/cases/federal/us/445/169/case.html, accessed 3/26/2015).

University Group Diabetes Program Research Group (1970a). A study of the effects of hypoglycemic agents on vascular complications in patients with adult onset diabetes: I. Design, methods, and baseline characteristics. Diabetes 19(Suppl 2):747-83.

University Group Diabetes Program Research Group (1970b). A study of the effects of hypoglycemic agents on vascular complications in patients with adult-onset diabetes: II. Mortality results. Diabetes 19(Suppl 2):785-830.

Urokinase Pulmonary Embolism Trial Study Group (1970). Urokinase pulmonary embolism trial. Phase 1 results: a cooperative study. Journal of the American Medical Association 214:2163-2172.

Urokinase Pulmonary Embolism Trial Study Group (1973). The urokinase pulmonary embolism trial. A national cooperative study. Circulation 47(Suppl 2):1-108.

Wald A (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics 16:117-186.

Zelen M (1982). The contributions of Jerome Cornfield to the theory of statistics. Biometrics 38(Suppl):11-15.

## Schlesselman JJ (2015). Jerome Cornfield’s Bayesian approach to assessing interim results in clinical trials.

## © James J Schlesselman, Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, 130 DeSoto Street, Pittsburgh, PA 15261. E-mail: jjs66@pitt.edu

Cite as:Schlesselman JJ (2015). Jerome Cornfield’s Bayesian approach to assessing interim results in clinical trials. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/jerome-cornfields-bayesian-approach-to-assessing-interim-results-in-clinical-trials/)## 1 Introduction

Jerome Cornfield (1912-1979) was a man of philosophical bent, engaging wit, deep thought, great mathematical talent, and formidable skill in written and oral debate (see, for example, Cornfield 1954; 1959; 1970c; 1971; 1975; 1976; 1978). He had extensive influence on biostatistics and medical research in the USA in the middle of the 20th century.

Cornfield was never awarded a university degree beyond a Bachelor of Science. Indeed, in the words of one distinguished colleague, “he represented a mockery of excessive adherence to traditional qualifications” (Frederickson 1982). Nevertheless, Cornfield was elected president of the American Statistical Association, the American Epidemiologic Society, and the International Biometric Society’s Eastern North American Region, and he was a fellow of the Institute of Mathematical Statistics, the American Statistical Association, and the American Association for the Advancement of Science (Greenhouse and Halperin 1980; Greenhouse 1982a).

While at the National Cancer Institute in the 1950s, Cornfield developed statistical methods for laboratory studies and epidemiologic investigations. His writing on the nature of causation and on evidence that may be used to buttress an inference of cause-and-effect arose from his involvement in one of the great public health controversies of his time: lung cancer in relation to cigarette smoking (Cornfield 1954; Cornfield et al. 1959).

Among his many contributions to epidemiology, Cornfield defended case-control studies as an appropriate method to assess the potential effects of an exposure on the risk of disease; developed the odds ratio, based on a case-control study, as an approximation to the corresponding estimate of relative risk based on a cohort study; and developed the rationale for the use of relative risk, as opposed to absolute risk, in studies of disease etiology. He also gave a persuasive rationale for the use of observational studies in scientific inference (Cornfield 1959).

Shortly after his death, the scope of Cornfield’s wide-ranging research activities was surveyed in a series of articles written by persons who knew him well. They discussed his contributions to laboratory research (Mantel 1982), epidemiologic studies (Greenhouse 1982b), statistical theory (Zelen 1982), and clinical trials (Ederer 1982; also see Green 1997).

## 2 Cornfield and clinical trials

The last major expression of Cornfield’s thinking about clinical trials appeared in his 1976 article,

Recent Methodological Contributions to Clinical Trials. It covered six areas: decision making, appraisal of uncertainty, likelihood ratios, the use of prior opinion in statistical analyses (Bayesian methods), patient subgroups (multiple comparisons), and randomization. Cornfield began his discussion with three broad assertions: statistical methods could never provide unique, unequivocal answers to problems of design and analysis; the process of inference and decision making in clinical trials is loosely structured, because that is the nature of an intrinsically complex enterprise; and analyses of data from clinical trials may inevitably lead to ambiguous answers.Cornfield’s skepticism about the role of statistical methods in decision making and scientific inference had been reinforced by his recent experience with the interim monitoring of several large-scale, multi-centre clinical trials. His reservations, however, had been voiced much earlier in a more general context in 1959, in his carefully reasoned, thought provoking overview of research methodology:

Principles of Research(Cornfield 1959).Cornfield had long advocated that clinical trials be randomized whenever feasible and ethical. He also urged that

individualsbe randomized if possible, rather than larger units, such as clinics or hospitals. He had stated the frequentist rationale for randomization in 1959:In his 1976 paper, Cornfield wrote:

Although randomization’s ‘logical status’ and ‘the exact function it performs’ may have become obscure upon deep reflection, Cornfield nonetheless answered both of the questions posed: ‘no’, to the first, and essentially ‘yes,’ to the second. However, although Cornfield was a long-standing advocate of randomized clinical trials, he believed that observational studies could also provide a secure foundation for scientific inference and decisions. His defense of observational studies in 1959, accompanied by his earlier writing on this topic, are reminiscent of Bradford Hill’s 1953 article,

Observation and Experiment, and Hill’s celebrated exposition of guidelines for assessing cause-and-effect from epidemiologic studies (Hill 1953; 1965).## 3 Decision making and appraisal of uncertainty in monitoring clinical trials

Cornfield’s early work in statistics reflected his training in frequentist methods, for which ‘probability’ is a measure of the

relative frequencywith which events occur in the natural world. This interpretation differs from that of Bayesians, for whom ‘probability’ is a measure of a person’sdegree of beliefin a proposition, such as ‘the risk of lung cancer is increased 20-fold in persons who smoke 2 packs of cigarettes per day’. Both conceptions of probability have a long history of use and application (Fienberg 1992, 2006), and both conform to the same axioms and theorems of probability theory. In addition to what is meant byprobability, the major difference in the Bayesian approach is that it imposes two requirements: first, one must specify a probability distribution (of belief) for values of the parameters involved in a statistical model to be used for an analysis; and second, when the study data become available, one must apply Bayes’ theorem to update the prior probability distribution. This second requirement, calledBayesrule, is a mathematical consequence of Bayesians’ insistence that any rational expression of degree of belief (personal probability) must be internally consistent (Lindley 2006, pp 36-37 & 64-66).In the mid-1960s Cornfield repudiated frequentist methods for monitoring clinical trials and expressed his conversion to a then radical point of view, namely, that tests of statistical significance and p-values, due to R.A. Fisher (1925), tests of statistical hypotheses and confidence intervals, due to Jerzy Neyman and Egon Pearson (1933a, 1933b), and sequential analysis, due to Abraham Wald and George Barnard (Wald 1945; Barnard 1946) – virtually everything that frequentists held dear – were seriously misguided (Cornfield 1966a; 1966b; 1969; 1970b; Cornfield and Greenhouse 1967). Such thoughts had essentially been stated earlier by Jimmy Savage among others (Savage 1961; Edwards et al. 1963), but it was Francis Anscombe’s review of Peter Armitage’s influential text,

Sequential Medical Trials, which really garnered attention (Armitage 1960; Anscombe 1963).Anscombe’s carefully reasoned, trenchant criticism of sequential analysis and Neyman-Pearson theory of hypothesis tests was so important that senior statisticians at the US National Institutes of Health held ‘an informal seminar’ about the issue in June 1965 (Cutler et al. 1966). On that occasion Cornfield argued for a Bayesian approach to monitoring clinical trials. Although he subsequently continued to use the frequentist concepts of ‘power’ and ‘level of statistical significance’ to plan clinical trials (Cornfield 1970a), Cornfield objected to their relevance in the interim analyses of a study’s emerging data, and by implication in the final analysis as well.

## 4 Cornfield and the University Group Diabetes Program (UGDP)

## 4.1 The UGDP saga

In his 1976 article, Cornfield referred to three clinical trials as examples of decisions that had to be made in the face of major, unexpected problems: the University Group Diabetes Program (1970a; 1970b), the Coronary Drug Project (1970; 1973), and the Diabetic Retinopathy Study (1976).

Because of the importance of the UGDP to the development of modern-day clinical trials (Meinert 2012; Marks 1997; Greene 2007), I use that study to explain Cornfield’s Bayesian method – relative betting odds – which was used as one of three statistical techniques to support the decision to discontinue tolbutamide. The UGDP also provides an instructive instance in which Cornfield’s formidable skills in rhetoric, statistical analysis, and logical thinking are clearly displayed in undermining the arguments of a study’s critics (Cornfield 1971). It additionally provides a striking example in which Cornfield’s Bayesian rationale for avoiding the use of p values and tests of statistical significance were contradicted by his practice.

Cornfield was not involved in the UGDP’s design, and he was not part of the research team initially. Nearly 6 years after the first patient had been enrolled and the study was fully in progress, his advice was sought by Christian Klimt, the head of the UGDP’s Statistical Center. At that time, Klimt was vexed by a major problem: informal, interim analyses indicated that more cardiovascular deaths were occurring in patients treated with tolbutamide than with placebo – a completely unexpected, inexplicable finding. Bayesian analyses by Cornfield and computer simulations for frequentist-based analyses were therefore developed to assess the unfavorable trend, which also indicated that overall mortality on tolbutamide was no better than that on placebo and was possibly worse. On the basis of these analyses, Cornfield advocated that treatment with tolbutamide be stopped, and he remained the major defender of the statistical basis of that recommendation (Cornfield 1971, 1974a, 1974b).

With Bayesian and frequentist analyses in hand, a two-day meeting of the UGDP’s Executive Steering Committee voted in June 1969 to drop tolbutamide as a study treatment, and duly informed the US Food and Drug Administration (FDA) of their decision. Despite skepticism by some reviewers in the agency and by its external advisors, the US Food and Drug Administration announced its intention in May 1970 to place a warning of increased cardiovascular hazard on the label for tolbutamide and all chemically-related agents (sulfonylurea drugs). This was done before any UGDP publication, and it set in motion a long-ensuing clinical and scientific controversy.

Within a few months of the UGDP’s first two publications (UGDP 1970a; 1970b), several articles highly critical of the study’s design, implementation, and statistical analyses appeared in the medical literature (Feinstein 1971; Leibel 1971; Salsburg 1971; Schor 1971). Those articles were followed shortly thereafter by further criticism of the UGDP (Seltzer 1972; Schor 1973), which continued throughout the 1970s (Feinstein 1976a; 1976b; 1979).

Cornfield was a representative of the UGDP in discussions with the US Food and Drug Administration, and he testified before a US Senate subcommittee in a hearing held over 3 days in September 1974 about the controversy involved (Cornfield 1974a; Subcommittee on Monopoly 1974). Amazingly, on the last day of that hearing, Cornfield was allowed to cross examine an articulate, forceful opponent of the UGDP, Holbrooke Seltzer, who was Professor of Internal Medicine at the University of Texas and a member of the Board of Directors of the American Diabetes Association (Seltzer 1974; Cornfield 1974b). Furthermore, at the request of Senator Gaylord Nelson, Chairman of the Subcommittee on Monopoly, Cornfield wrote replies to numerous interrogatories proffered by the attorney representing a group of diabetologists, including Seltzer, who opposed the actions of the US Food and Drug Administration, and challenged the validity of the UGDP’s findings (Cornfield 1974c). Those challenges, which unsuccessfully sought to obtain the UGDP data, were ultimately decided by the US Supreme Court (1980).

## 4.2 The application of Cornfield’s ‘relative betting odds’

Although the UGDP was planned and monitored with frequentist methods, it became the first clinical trial in which a Bayesian analysis was used, what Cornfield called ‘relative betting odds’, which is a Bayesian method for testing hypotheses (Cornfield 1966a, 1969; also see Edwards et al. 1963). The term ‘betting odds’ refers to an index of

personal beliefin a proposition or hypothesis (H), such as the following:The strength of one’s belief that H is true is explained by way of placing a bet. For someone to give odds of 10:1 in favor of H, before the UGDP data have been inspected, i.e. ‘prior odds’ that H is true, also called ‘prior odds of H’, means that one is prepared to bet 10 units that H is true in order to receive only 1 unit if H is shown to be false.

In Cornfield’s application to the UGDP, betting odds on H were often expressed relative to a specific alternative hypothesis (H

_{A}),The figure below shows that as data from the UGDP emerged, they began to indicate that H is false.

Figure 1. Cumulative mortality rates per 100 persons at risk, by year of follow-up (UGDP 1970b). TOLB tolbutamide; PLBO: placebo.

In the 204 patients on tolbutamide, there were 30 deaths in total by the 8th year of follow-up, 26 of which were attributed to cardiovascular causes. By comparison, there were only 21 deaths in the 205 patients on placebo, 10 of which were attributed to cardiovascular causes. The life-table estimate of the cumulative risk of cardiovascular death (intent-to-treat analysis) was 17.6 (standard error 2.5) per 100 on tolbutamide vs. 6.0 (standard error 2.5) per 100 on placebo.

In view of such data, a person’s ‘posterior odds’ in favor of H being true, i.e. one’s belief that ‘the risk of cardiovascular death on tolbutamide is identical to that on placebo’, should be smaller than his ‘prior odds’. The amount of decrease is given by Bayes theorem. That amount is what Cornfield called the ‘relative betting odds’. It expresses the degree to which the acquisition of data should change one’s ‘prior odds’ to ‘posterior odds’. Expressed schematically (Goodman 1999):

Values of relative betting odds less than 1 should diminish one’s posterior odds that H is true, whereas values of relative betting odds greater than 1 should enlarge them.

The tabulation below displays some of the UGDP’s many reported values of relative betting odds (UGDP 1970b). The alternative hypothesis (H

_{A}) in the tabulation is that ‘the cumulative risk of cardiovascular death is increased by 25% over the cumulative risk on placebo’.One sees from the tabulation that the UGDP’s accumulating data led to progressively decreasing values of the relative betting odds. By the 8th year of follow-up, the relative betting odds were 0.15. In other words, at that time a person’s prior odds in favor of H, i.e., that ‘the risk of cardiovascular death on tolbutamide is identical to that on placebo’, should have been diminished by 85%.

Relative betting odds depend not only on the data, but also on the degree of belief in the alternative hypotheses against which the null hypothesis is tested (see, for example, Cornfield 1966, p 580; Urokinase Pulmonary Embolism Trial 1973, p II-27; Jennison and Turnbull 1999, p 340). Further complications arise in calculating relative betting odds for ‘composite hypotheses’, such as ‘the risk of cardiovascular death on tolbutamide is within 5% of the risk on placebo’, or ‘the risk of cardiovascular death on tolbutamide is at least 25% greater than the risk on placebo’ (see, for example, Edwards et al. 1963; Cornfield 1969).

## 4.3 Cornfield’s defense of the UGDP

At the outset of his rebuttal of the UGDP’s critics, Cornfield stated that the study’s prudent decision and moderately worded conclusion had been:

Cornfield’s claim that his purpose was “not to defend the UGDP” is at odds with what he did, namely, rebut every statistical criticism that had been leveled at the study. Furthermore, his remark that “the more general UGDP finding that lowering of blood glucose level did not appreciably lower the eight-year mortality from cardiovascular disease as compared with patients on diet alone” was based on a serendipitous result: the UGDP was not designed to address this issue. Nonetheless, by insightful analysis using standard statistical methods and ruthless logic, Cornfield demonstrated that no factual basis supported the critics’ concerns that randomization had somehow broken down and produced major baseline inequalities which nullified the results; that excess mortality on tolbutamide was confined to a small number of clinics, and because of this the UGDP’s findings could not be generalized to medical practice; and that dropouts, non-adherence to treatment, and the lower-than-expected mortality in the placebo group undermined the UGDP’s findings concerning tolbutamide.

Cornfield also rebutted a number of clinical concerns, including those expressed about the eligibility criteria for study patients, the use of a fixed dose of tolbutamide, the determination of the principal cause of death, the definitions of baseline risk factors, the failure to obtain data on patients’ smoking histories, and the decision to stop treatment with tolbutamide before a more conclusive demonstration was available for its apparently harmful effect.

Despite Cornfield’s advocacy of Bayesian methods in clinical trials and the use of his relative betting odds to support the UGDP’s decision to terminate treatment with tolbutamide, his rebuttal relied extensively on the very frequentist methods that he was arguing against: p-values and tests of statistical significance as a basis of support for or against hypotheses. Bayesian methods were conspicuous by their absence.

What can one say about the discrepancy between Cornfield’s preaching against p-values and his practice? One answer is that he chose to use well-accepted frequentist methodology in defending the UGDP because it suited his purpose, was not misleading, and because framing a rebuttal through a series of Bayesian analyses would have been incomprehensible to and likely rejected by the vast majority of readers. A related answer is that a Bayesian rebuttal would have required Cornfield to specify prior distributions of belief in the issues he was discussing. One specification could have reflected his personal beliefs, or that of others defending the UGDP, but such prior distributions had never been expressed before the UGDP data were examined. Another tactic could have involved the application of a ‘neutral prior’, that is, a prior distribution of belief that represented weakly-held opinion about the issues involved. This would have led to posterior distributions that approximate the likelihood function (Royall 1997), with results similar to those based on p-values. None of the analyses, however, would have satisfied the UDGP’s fiercest critics, who were alleging that the study was a seriously flawed investigation beyond the remediation of statistics. The only choice of prior distribution that would have satisfied them would have been one that led to accepting their position.

As emphasized by Lindley (2000, p 313), Bayesian methods are deficient in situations involving conflict, a circumstance that embroiled the UGDP both internally (some of the study investigators opposed the decision to stop treatment with tolbutamide) and externally (for various reasons many clinicians not involved in the UGDP did not believe its results). Although the UGDP was challenged by some critics because the study had stopped treatment with tolbutamide “too soon”, i.e. before more data on mortality were at hand, it is doubtful in retrospect whether more data from the same study would have ever settled the controversy.

## 5 Cornfield’s legacy

Cornfield’s retirement from the US Civil Service in 1967 left the NIH with no influential advocate of Bayesian theory, which had been subject to some highly contentious argument (see, for example, Savage 1961; Hartley 1963; Kempthorne 1969; Bross 1969; Lindley 1975, 1986; Chernoff 1986). Only two other clinical trials, the Coronary Drug Project (1970, 1973, 1980) and the Urokinase Pulmonary Embolism Trial (1970, 1973), used Cornfield’s relative betting odds methodology, which is now largely abandoned for interim monitoring (Jennison and Turnbull 1999, p 340). Bayesian methods currently use the posterior distribution of the parameter of interest, such as the relative risk or the difference in risk. To limit the number of different estimates that can arise from a given set of data, some recommend that Bayesian calculations be made under three broadly different assumptions: a neutral prior, a skeptical prior (against the alternative hypothesis of interest), and a prior that favors the alternative hypothesis (Spiegelhalter et al. 1994, 2004), which is similar to a proposal made by Cornfield and Greenhouse (1967, pp 823-825).

Only a few of Cornfield’s articles on clinical trials are now cited in articles and texts. His evocative phrase ‘relative betting odds’, which so effectively conjures the image of a casino, has been supplanted sadly by the nondescript ‘Bayes factor’, a term coined by IJ Good (Good 1958; 1967; 1999). Despite Cornfield’s key contribution to the interpretation of the UGDP, the investigators did not use the term ‘relative betting odds’ in their publications, perhaps because of its connotation and the controversy in which they were involved. The UGDP, however, did use ‘RBO’, which was said to be an abbreviation for ‘likelihood ratio’, and they cited Cornfield’s article (Cornfield 1969), but without mentioning that RBO was his acronym for ‘relative betting odds’.

Even if one subscribes to the use of Bayes’ rule for modifying prior belief, in practical applications such as the UGDP, making decisions and taking action will involve additional considerations: assessing the design of a study vs. its implementation, and weighing costs, benefits, and ethical concerns, all of which can be subject to widely differing opinions (Armitage 2013). With regard to ethics, Anscombe’s proposed Bayesian method to decide whether to continue recruitment to a clinical trial when accumulating data suggest that one of the treatments being compared is superior (Anscombe 1963) failed to account for physicians’ patient-centered perspective (Armitage 1963; 2013), a viewpoint which Cornfield evidently shared (see quotation at the start of section 4.3).

Cornfield became a staunch advocate of Bayesian methods, but he was never an ideologue on their behalf. His thinking and writing about “the Bayesian outlook” from 1966 forward was accompanied by his contemporaneous use of frequentist techniques, p-values among them (see Cornfield 1971), which he claimed to deplore in principle. The resultant inconsistency between this theorizing and practice of a supremely logical man might be explained by Cornfield being not only a scientific pragmatist, i.e., using methods that were accepted by the large majority clinicians and scientists, but also a skeptic, which was expressed in his 1976 article,

Recent Methodological Contributions to Clinical Trials: there can never be a completely reliable foundation of scientific inference or decision-making in the face of uncertainty, and statistical methods alone, Bayesian or otherwise, are incapable of solving this problem (Armitage 2013).If the above remarks are true, then why did Cornfield convert to and espouse Bayesian methods? One answer, offered by Cornfield, is that frequentist methods not only lead to logical conundrums, but they are also inflexible in practice, especially in the very instances where flexibility is scientifically most needed (see, for example, Cutler et al. 1966, pp 862-866 & 877-881; Cornfield 1966b, 1969, 1970b). In adopting that position, he was apparently willing to overlook problems with Bayesian theory (see, for example, Efron 1986; Royall 1997, pp 167-176), arguing that the advantage to be gained by improved flexibility was not accompanied by any important loss from abandoning frequentist techniques.

Cornfield’s paper on recent methodological contributions to clinical trials was presented on 30 April 1976 at the Reed-Frost Symposium of the Johns Hopkins University. At the time Cornfield spoke, he did not realize that he would soon cross a threshold and pass into history. One year earlier, however, he had mused about the future:

While Cornfield’s contributions to clinical trials may be nearly forgotten, the Bayesian perspective that he advocated is becoming more widely accepted (see, for example, Spiegelhalter et al. 2004; Berry 2006; FDA 2010), a development that would have pleased him.

AcknowledgementIain Chalmers, Curtis Meinert, Alfredo Morabia, and Jan Vandenbroucke offered wise advice and insightful critiques of an earlier draft of this commentary. I am responsible for any errors of fact or interpretation that may remain.

This James Lind Library article has been republished in the

Journal of the Royal Society of Medicine2016;109:27-35 Print PDF## References

Anscombe FJ (1963). Sequential medical trials. Journal of the American Statistical Association 58:365-384.

Armitage P (1960). Sequential medical trials. Oxford: Blackwell.

Armitage P (1963). Sequential medical trials: some comments on F.J. Anscombe’s paper. Journal of the American Statistical Association 58:384-387.

Armitage P (2013). The evolution of ways of deciding when clinical trials should stop recruiting. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/the-evolution-of-ways-of-deciding-when-clinical-trials-should-stop-recruiting/).

Barnard GA (1946). Sequential tests in industrial statistics. Supplement to the Journal of the Royal Statistical Society 8:1-21.

Berry DA (2006). Bayesian clinical trials. Nature Reviews. Drug Discovery. 5(1):27-36.

Bross IDJ (1969). Applications of probability: Science vs. pseudoscience. Journal of the American Statistical Association 64:51-57.

Chernoff H (1986). Comment on “Why isn’t everyone a Bayesian?” The American Statistician 40(1):5-6.

Cornfield J (1954). Statistical relationships and proof in medicine. The American Statistician 8(5):19-21.

Cornfield J (1959). Principles of research. American Journal of Mental Deficiency 64:240-252.

Cornfield J (1966a). A Bayesian test of some classical hypotheses, with applications to sequential clinical trials. Journal of the American Statistical Association 61:577-594.

Cornfield J (1966b). Sequential trials, sequential analysis and the likelihood principle. The American Statistician 20:18-23.

Cornfield J (1969). The Bayesian outlook and its applications (with discussion). Biometrics 25:617-657.

Cornfield J (1970a). Fixed and floating sample size trials. In Symposium on Statistical Aspects of Protocol Design. Engle RL, Jr. (Symposium Chairman). Bethesda, Maryland: Clinical Investigation Review Committee, Clinical Investigations Branch, National Cancer Institute, National Institutes of Health, pp 181-187, with discussion on pp 197-204.

Cornfield J (1970b). The frequency theory of probability, Bayes’ theorem, and sequential clinical trials. In Bayesian Statistics (eds. Donald L. Meyer, Raymond O. Collier, Jr.) Itasca, Illinois: Peacock Publishers Inc., pp 1-28.

Cornfield J (1970c). Discussion by J. Cornfield, B.M. Jill, D.V. Lindley, S. Geisser, and C.M. Mallows. In Bayesian Statistics (eds. Donald L. Meyer, Raymond O. Collier, Jr.) Itasca, Illinois: Peacock Publishers Inc., pp 85-125.

Cornfield J (1971). The University Group Diabetes Program. A further statistical analysis of the mortality findings. Journal of the American Medical Association 217:1676-1687.

Cornfield J (1974a). Statement of Dr. Jerome Cornfield, Chairman, Department of Statistics, The George Washington University, Washington, D.C. In Subcommittee on Monopoly (1974), pp 10778-10794.

Cornfield J (1974b). Interrogation of Holbrooke S. Seltzer, M.D. In Subcommittee on Monopoly (1974), pp 10889-10895.

Cornfield J (1974c). Correspondence between Senator Gaylord Nelson and Neil L. Chayet, Dr. Jerome Cornfield, Dr. Christian R. Klimt, and Dr. Jeremiah Stamler. In Subcommittee on Monopoly (1974), pp 11507-11523.

Cornfield J (1975). A statistician’s apology. Journal of the American Statistical Association 70:7-14.

Cornfield J (1976). Recent methodological contributions to clinical trials. American Journal of Epidemiology 104:408-421.

Cornfield J (1978). Randomization by group: a formal analysis. American Journal of Epidemiology 108:100-102.

Cornfield J, Greenhouse SW (1967). On certain aspects of sequential clinical trials. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 4. (eds. Jerzy Neyman, Lucien M. LeCam) Berkeley, California: University of California Press, pp. 813-829.

Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL (1959). Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22:173-203.

Coronary Drug Project Research Group (1970). The Coronary Drug Project: Initial findings leading to modifications of its research protocol. JAMA 214(7):1303-1313.

Coronary Drug Project Research Group (1973). The Coronary Drug Project: Findings Leading to Discontinuation of the 2.5-mg/day Estrogen Group. JAMA 226(6):652-657.

Coronary Drug Project Research Group (1980). Influence of adherence to treatment and response of cholesterol on mortality in the coronary drug project. New England Journal of Medicine 303:1038-41.

Cutler SJ, Greenhouse SW, Cornfield J, Schneiderman MA (1966). The role of hypothesis testing in clinical trials. Journal of Chronic Diseases 19:857-882.

Ederer F (1982). Jerome Cornfield’s contributions to the conduct of clinical trials. Biometrics 38(Suppl):25-32.

Edwards W, Lindman H, Savage LJ (1963). Bayesian statistical inference for psychological research. Psychological Reviews 70:193-242.

Efron B (1986). Why isn’t everyone a Bayesian? The American Statistician 40(1):1-5.

The Diabetic Retinopathy Study Research Group (1976). Preliminary report on effects of photocoagulation therapy. American Journal of Ophthalmology 81(4):383-396.

FDA (2010). Guidance for Industry and FDA Staff. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials (https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials accessed 21/8/2019.

Feinstein AR (1971). Clinical biostatistics VIII. An analytic appraisal of the University Group Diabetes Program (UGDP) study. Clinical Pharmacology and Therapeutics 12:167-191.

Feinstein AR (1976a). Clinical biostatistics XXXV. The persistent clinical failures and fallacies of the UGDP study. Clinical Pharmacology and Therapeutics 19:78-93.

Feinstein AR (1976b). Clinical biostatistics XXXVI. The persistent biometric problems of the UGDP study. Clinical Pharmacology and Therapeutics 19:742-785.

Feinstein AR (1979). How good is the statistical evidence against oral hypoglycemic agents? Advances in Internal Medicine 24:71-95.

Fienberg SE (1992). A brief history of statistics in three and one-half chapters: a review essay. Statistical Science 7(2): 208-225.

Fienberg SE (2006). When did Bayesian inference become “Bayesian”? Bayesian Analysis 1:1-40.

Fisher RA (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.

Frederickson DS (1982). Remarks. Biometrics 38(Suppl):7.

Good IJ (1958). Significance tests in parallel and in series. Journal of the American Statistical Association 53:799-813.

Good IJ (1967). A Bayesian significance test for multinomial distributions. Journal of the Royal Statistical Society, Series B 29:399-431.

Good IJ (1999). Letter re Bayes factors: what they are and what they are not. The American Statistician 55:173-174.

Goodman SN (1999). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130:1005-1013.

Green SB (1997). A conversation with Fred Ederer. Statistical Science 12:125-31.

Greene JA (2007). Prescribing by Numbers. Baltimore:The Johns Hopkins University Press.

Greenhouse SW (1982a). A Tribute. Biometrics 38(Suppl):3-6.

Greenhouse SW (1982b). Jerome Cornfield’s contributions to epidemiology. Biometrics 38(Suppl):33-45.

Greenhouse SW, Halperin M (1980). Jerome Cornfield (1912-1979). The American Statistician 34(2):106-107.

Hartley HO (1963). In Dr. Bayes’ consulting room. The American Statistician 17(1):22-24.

Hill AB (1953). Observation and experiment. New England Journal of Medicine 248:995-1001.

Hill AB (1965). The environment and disease: association or causation? Proceedings of the Royal Society of Medicine 58:295-300.

Jennison C, Turnbull BW (1999). Group sequential methods with applications to clinical trials. New York: Chapman & Hall / CRC.

Kempthorne O (1969). Discussion of “The Bayesian outlook and its application.” Biometrics 25:647-654.

Leibel B (1971). An analysis of the University Group Diabetes Study Program: data results and conclusions. Canadian Medical Association Journal 105:292-294.

Lindley DV (1975). The future of statistics: A Bayesian 21st century. Advances in Applied Probability 7(Suppl):106-115.

Lindley DV (1986). Comment on “Why isn’t everyone a Bayesian?” The American Statistician 40(1):6-7.

Lindley DV (2000). The philosophy of statistics. The Statistician 49:293-337.

Lindley DV (2006). Understanding Uncertainty. New York: Wiley.

Mantel N (1982). Jerome Cornfield and statistical applications to laboratory research: a personal reminiscence. Biometrics 38(Suppl):17-23.

Marks HM (1997). The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900-1990. Edinburgh: Cambridge University Press.

Meinert CL (2012). Clinical Trials: Design, Conduct, and Analysis, 2nd ed. New York: Oxford University Press.

Neyman J, Pearson ES (1933a). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Series A 231:289-337.

Neyman J, Pearson ES (1933b). The testing of statistical hypotheses in relation to probabilities a priori. Proceedings of the Cambridge Philosophical Society 29:492-510.

Royall R (1997). Statistical Evidence: A Likelihood Paradigm. New York: Chapman & Hall, pp 169-176.

Salsburg DS (1971). The UGDP study. Journal of the American Medical Association 218:1704-1705.

Savage LJ (1961). The foundations of statistics reconsidered. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol 1. (ed. Jerzy Neyman) Berkeley: University of California Press, pp. 575-586.

Schor S (1971). The University Group Diabetes Program. A statistician looks at the mortality results. Journal of the American Medical Association 217:1671-1675.

Schor SS (1973). Statistical problems in clinical trials: the UGDP study revisited. The American Journal of Medicine 55:727-732.

Seltzer HS (1972). A summary of criticisms of the findings and conclusions of the University Group Diabetes Program (UGDP). Diabetes 21:976-979.

Seltzer HS (1974). Statement of Holbrooke S. Seltzer, M.D., Chief of Metabolism at the Veterans’ Administration Hospital and Professor of Internal Medicine at The University of Texas, Dallas, Tex. In Subcommittee on Monopoly (1974), pp 10880-10889.

Spiegelhalter DJ, Abrams KR, Myles JP (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. New York, Wiley.

Spiegelhalter DJ, Freedman LS, Parmar MKB (1994). Bayesian approaches to randomized trials. Journal of the Royal Statistical Society, Series A 157:357-416.

Subcommittee on Monopoly (1974). Hearings on the Present Status of Competition in the PharmaceuticaI lndustry, Second Session, Part 25, Oral Hypoglycemic Drugs. Select Committee on Small Business, United States Senate: September 18, 19, and 20, 1974 (http://babel.hathitrust.org/cgi/pt?id=mdp.39015005125193, accessed 3/26/2015).

US Supreme Court (1980). Forsham v. Harris, 445 U.S. 169 (1980). Forsham v. Harris No. 78-1118. Argued October 31, 1979, Decided March 3, 1980, 445 U.S. 169 (http://supreme.justia.com/cases/federal/us/445/169/case.html, accessed 3/26/2015).

University Group Diabetes Program Research Group (1970a). A study of the effects of hypoglycemic agents on vascular complications in patients with adult onset diabetes: I. Design, methods, and baseline characteristics. Diabetes 19(Suppl 2):747-83.

University Group Diabetes Program Research Group (1970b). A study of the effects of hypoglycemic agents on vascular complications in patients with adult-onset diabetes: II. Mortality results. Diabetes 19(Suppl 2):785-830.

Urokinase Pulmonary Embolism Trial Study Group (1970). Urokinase pulmonary embolism trial. Phase 1 results: a cooperative study. Journal of the American Medical Association 214:2163-2172.

Urokinase Pulmonary Embolism Trial Study Group (1973). The urokinase pulmonary embolism trial. A national cooperative study. Circulation 47(Suppl 2):1-108.

Wald A (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics 16:117-186.

Zelen M (1982). The contributions of Jerome Cornfield to the theory of statistics. Biometrics 38(Suppl):11-15.