Clarke M (2015). History of evidence synthesis to assess treatment effects: personal reflections on something that is very much alive.

© Mike Clarke, Centre for Public Health, Institute of Clinical Sciences, Block B, Queens University Belfast, Royal Victoria Hospital, Grosvenor Road, Belfast BT12 6BA E-mail:

Cite as: Clarke M (2015). History of evidence synthesis to assess treatment effects: personal reflections on something that is very much alive. JLL Bulletin: Commentaries on the history of treatment evaluation (

Setting the scene

It’s a challenge to give an account of the “history” of something that I have been a part of in recent decades. If I write about something that took place 25 years ago, is that historical, when it feels so near in time to me? When I teach undergraduates and show an article that was published before they were born, is that history or part of the here and now? And, when I look back on lectures from early in my career that mentioned things from the previous century, they felt, and were, some distance in the past. Writing this in August 2015, the last century is only 15 years behind us, and we need to refer to the century before last when we think about the 1800s. In less than 90 years, the “present” will be the last “century” and current and upcoming decades will be “history”. That history is being made right now in relation to evidence synthesis. And the last couple of decades have seen developments that will become seen as pivotal.

In this essay, I try to capture some thoughts on the last hundred or more years, writing partly as a historian with a strong interest in how ideas evolve in parallel and independently, and also as someone who has been part of the history for 25 years and been fortunate to work with others who have been part of it for much longer. I will look at who and what happened “early” rather than engage in a competition to find who was “first”, and try to provide a framework to help readers to think about history when it is still being made around us. I will try to highlight how several key elements for the successful conduct and uptake of evidence synthesis to assess treatment effects came together over recent decades to produce the upsurge in this activity, and a step change about 20 years ago. I draw on examples from the James Lind Library ( as well as other accounts of various aspects of the history of evidence synthesis in health and social care [Lewis and Clarke 2001; Chalmers et al. 2002; Starr et al. 2009; Bastian et al. 2010; Fox 2011; Shadish and Lecy 2015], including the influences of women in this history [Dickersin 2015]. This account should not itself be considered to be a “systematic review”. It is a collection of illustrative examples to describe the journey that evidence synthesis has taken over more than 100 years, and to highlight examples of how the quality of this research has changed over time [Kass 1981; Mulrow 1987; Sacks et al. 1987; Mulrow 1998; Moher et al. 2007]. I am sure that examples have been missed, some of which may be particularly important, and I should welcome information on any such examples and suggestions for improvements.

What does it mean?

There are many terms used for evidence synthesis, just as there are many terms for “evidence”. This article focuses to a large extent on systematic reviews, in which a question is formulated, eligible studies are identified and appraised, and the findings are combined (sometimes mathematically) to summarise the effects, and perhaps to draw conclusions about the implications for future practice and research. The emphasis will be on research into the effects of interventions in health and social care, but it is important to note there is a growing body of systematic reviews of other key areas for decision making [Moher et al. 2007]. These include diagnostic accuracy and prognosis, and the use of evidence from other types of investigation including qualitative research, animal studies and modelling. This essay might therefore be considered to be a history of research synthesis with a focus on systematic reviews, and the important role played by a particular type of review: the Cochrane Review. The history of more statistical aspects of meta-analyses is dealt with in a companion article in the James Lind Library [O’Rourke 2006].

An illustration of how historical analyses have been transformed by the living history of modern developments is the work involved in the review of documents from the past. Thirty years ago, someone wanting to know if a particular term had been used in nineteenth century medical journals would need to go to library, take the journals from the shelves and work through them methodically. Now, we go online and run a search of the digitised archives in seconds. This makes it much easier for us to find today’s terms in the nineteenth century medical literature, but we still need to apply critical reasoning to consider whether the terms mean the same. This can make document review easier if a term was invented for the specific purpose of our interest, and has no pre-history. This is the case with the term “meta-analysis”, but not with “systematic review”. However, early uses of the latter can provide insight into why we use it now, and how people did similar things in the past. To begin this journey, a search of the BMJ digitised archive for the phrase “systematic review” finds an article from 1867 discussing the recently published edited reports of St Bartholomew’s and St George’s hospitals, which notes ”Daunted by the difficulty of any systematic review of these collections of monographs, we shall only take a flying run through the pages; warning our readers, that they will do well to indemnify themselves by procuring the volumes for systematic perusal” [Anon 1867]. This neatly captures the fact that preparing a systematic review can be a daunting task, and the challenges of undertaking one have been well described in guides to their conduct through recent decades [Goldschmidt 1986; Cooper and Hedges 1994; Pettiti 1994; Higgins and Green 2008; CRD 2009].

Understanding the purpose of evidence syntheses, to understand why people do them

There are many reasons for doing evidence synthesis. These include the need to minimise bias by bringing together all of the available evidence on a particular topic, so that the emphasis is on the totality of the evidence and not merely a sample of the studies, highlighted because of their results. There is also a need to reduce the effects of the play of chance, by increasing the statistical power through the incorporation of as much data on the topic as possible, which can also be achieved by bringing together all of the available evidence on a particular topic but also requires that the data from that evidence can be combined mathematically, in meta-analyses. The history of the latter is dealt with partly in the companion article by Keith O’Rourke [O’Rourke 2006]. Some of the reasons for doing evidence synthesis overlap, but some are mutually exclusive. Some have changed in emphasis over time. However, the following list helps to orientate any work that wishes to look at why people have done, and continue to do them. The examples that follow highlight some of these reasons and these reasons help to provide a basis for understanding why an evidence synthesis, rather than a single study or a haphazard collection of studies, became so important:

  • To organise a collection of the evidence
  • To appraise the quality of the evidence
  • To minimise bias; including avoiding undue emphasis on individual studies
  • To compare and contrast similar studies
  • To combine their findings, if possible and appropriate, to increase statistical power
  • To improve access to the evidence
  • To identify cost-effective interventions
  • To design better studies in the future

As a starting point for considering the scientific value of evidence synthesis, let’s go back to the 1880s and a presidential address to the British Association for the Advancement of Science by Lord Rayleigh in Montreal. He said:

“If, as is sometimes supposed, science consisted in nothing but the laborious accumulation of facts, it would soon come to a standstill, crushed, as it were, under its own weight. The suggestion of a new idea, or the detection of a law, supersedes much that has previously been a burden on the memory, and by introducing order and coherence facilitates the retention of the remainder in an available form. Two processes are thus at work side by side, the reception of new material and the digestion and assimilation of the old. One remark, however, should be made. The work which deserves, but I am afraid does not always receive, the most credit is that in which discovery and explanation go hand in hand, in which not only are new facts presented, but their relation to old ones is pointed out.” [Rayleigh 1885]

If we move forward 13 years, to another address to another meeting in North America, George Gould presented a vision to the first meeting of the Association of Medical Librarians in Philadelphia on May 2 1898:

“I look forward to such an organisation of the literary records of medicine that a puzzled worker in any part of the civilized world shall in an hour be able to gain a knowledge pertaining to a subject of the experience of every other man in the world.” [Gould 1898]

These two examples from the late nineteenth century show both the scientific justification for evidence synthesis (to make best use of what has gone before) and the practical justification (to make it easier for decision makers to access the knowledge from what has gone before). The latter also provides an important opportunity to note the important contribution that librarians and information specialists have made to improving access to the raw material for systematic reviews. In a review in the mid-1960s Wechsler et al. (1965) report that they did “an extensive search of the literature” for research evaluating antidepressant medications on hospitalized mental patients. Such searches have become easier over the subsequent half century, through the development of bibliographic databases containing millions of records and online access to full-text articles. In the early 1990s, when the Cochrane Collaboration was established (see below), the principal medical database, MEDLINE, contained fewer than 20,000 records that could be easily retrieved as reports of randomised trials (Dickersin et al. 1994). Through an extensive programme of searching by members of Cochrane, and improved indexing, this number is now into the hundreds of thousands in MEDLINE and the Cochrane Central Register of Controlled Trials in the Cochrane Library, contains records for more than 880,000 reports (Lefebvre et al. 2008; Bastian et al. 2010).

James Lind: an early trial and early evidence synthesis

In his 1753 treatise on scurvy, not only did James Lind describe his celebrated trial on scurvy, he also provided what the cover subtitle describes as a “Critical and Chronological View of what has been published on the subject”. He outlines the need for this with the words:

“As it is no easy matter to root out prejudices,….it became requisite to exhibit a full and impartial view of what had hitherto been published on the scurvy, and that in a chronological order, by which the sources of these mistakes may be detected. Indeed, before the subject could be set in a clear and proper light, it was necessary to remove a great deal of rubbish.” [Lind 1753]

Other examples of efforts by researchers to summarise all the existing evidence are available in the James Lind Library from the decades at the start of the recent surge in activity. For example, in 1969, Smith et al. wrote that their “comprehensive overview of antidepressant literature published in English … attempts to describe a total field of research enquiry”. The Lind quote above captures one of the reasons for a key component of a modern evidence synthesis: the critical appraisal of the potentially eligible studies, with a view to minimising bias and separating the good from the bad. However, as noted by Chalmers et al., “It was not really until the 20th century … that the science of research synthesis as we know it today began to emerge” [Chalmers et al. 2002]. And, perhaps it was not until nearly the fourth quarter of that century that proper recognition of evidence synthesis as “science” began to develop, even though it has continued to be a challenge to have such research accepted as a scientific endeavor.

By way of illustration from the 1970s, in 1971 Feldman wrote that systematically reviewing and integrating research evidence “may be considered a type of research in its own right – one using a characteristic set of research techniques and methods” [Feldman 1971]. In the same year, Light and Smith noted that it was impossible to address some hypotheses other than through analysis of variations among related studies, and that valid information and insights could not be expected to result from this process if it depended on the usual, scientifically undisciplined approach to reviews [Light and Smith 1971]. In 1977, Eugene Garfield drew attention to the importance of scientific review articles in advancing original research, showing how review articles had high citation rates and review journals had high impact factors [Garfield 1977]. He proposed a new profession, “scientific reviewer”, and his Institute for Scientific Information went on to co-sponsor (with Annual Reviews Inc.) an annual award for “Excellence in Scientific Reviewing”, administered by the National Academy of Sciences [Garfield 1979].

Mathematics, statistics and meta-analyses

One of the early examples cited by Chalmers et al. [Chalmers et al. 2002] of an evidence synthesis highlights how the use of statistical techniques helped to introduce scientific rigour to evidence synthesis. In the British Medical Journal of 5 November 1904 Karl Pearson, director of the Biometric Laboratory at University College London, pooled data from five studies of immunity and six studies of mortality among soldiers serving in India and South Africa to investigate the effects of a vaccine against typhoid. He calculated mean values across the two groups of study, noting:

“Many of the groups in the South African experience are far too small to allow of any definite opinion being formed at all, having regard to the size of the probable error involved. Accordingly, it was needful to group them into larger series. Even thus the material appears to be so heterogeneous, and the results so irregular, that it must be doubtful how much weight be attributed to the different results.” [Pearson 1904]

In 1940, a group of researchers from Duke University in the USA produced the book Extra-sensory perception after sixty years which included statistical analyses that combined the results of individual studies and stated:

“The comparison of the statistics of more than one experiment suggests a counterpart: the combination of them for an estimate of total significance” [Pratt et al. 1940].

It was in April 1976, though, that a key step took place with the introduction of a new term for this statistical combination: ‘meta-analysis’. Gene Glass used his American Educational Research Association presidential address, to describe the need for better synthesis of the results of research studies, through a process he termed “meta-analysis”. In the published version of the speech, he wrote:

“My major interest currently is in what we have come to call – not for want of a less pretentious name – the meta-


of research. The term is a bit grand, but it is precise, and apt, and in the spirit of “metamathematics,” “meta-psychology,” and “meta-evaluation.” Meta-analysis refers to the analysis of analyses. [Glass 1976]

Smith and Glass (1977) published a substantial example of one such meta-analysis the following year, to look at research in psychotherapy. Their report drew on the accumulated evidence from 25,000 people in 375 studies of psychotherapy and counseling with 833 effect-size measures, and was introduced with the words:

“The purpose of the present research has three parts: (1) to identify and collect all studies that tested the effects of counseling and psychotherapy; (2) to determine the magnitude of effect of the therapy in each study; and (3) to compare the effects of different types of therapy and relate the size of effect to the characteristics of the therapy (e.g., diagnosis of patient, training of therapist) and of the study. Meta-analysis, the integration of research through statistical analysis of the analyses of individual studies (Glass 1976), was used to investigate the problem.” [Smith and Glass 1977]

The term meta-analysis appears sporadically in the medical literature over the subsequent years but a notable example is in a 1982 comparison of 37 reports comparing pharmacological versus non-pharmacological treatments for hypertension. Andrews et al. (1982) wrote:

“Glass introduced an approach called meta-analysis in which the properties of several studies could be recorded in quantitative terms and descriptive statistics applied to derive an overall conclusion. Thus, reviewing the published works ceases to require the judgment of Solomon and becomes a quasiempirical procedure. We used the meta-analytic technique to review non-pharmacological treatments for hypertension.” [Andrews et al. 1982]

Around the same time as the introduction of the term ‘meta-analysis’, others were describing methods for combining the results of separate studies. In early 1977, Peto et al. published the second in a pair of papers on the analyses of trials with prolonged follow up and the use of time-to-event analyses, showing how the results of separate studies might be combined as though each trial was a separate strata in a single study. [Peto et al. 1977]

One of the things that subsequently accompanied these statistical techniques was a new way to display the findings of the meta-analyses: a graph that is now sometimes called the forest plot [Lewis and Clarke 2001]. This shows the results for each study as a single line of data and graphical image, with a symbol at the bottom to indicate the overall average. In 1978, Freiman et al. displayed the results of 71 “negative” trials with horizontal lines for the confidence interval for each study and a mark to show the point estimate [Freiman et al. 1978].

In 1982, Lewis and colleagues produced something similar to display a meta¬-analysis of the effects of beta-blockers on mortality [Lewis 1982]. In 1988, the Antiplatelet Trialists’ Collaboration published what would now be widely recognized as a forest plot in a systematic review of the prevention of vascular disease by antiplatelet therapy. This used squares of different sizes to show the weight of each study in the meta-analysis and the point estimates for the odds ratio frpm each trial, with the associated confidence intervals running through these. A rhombus, whose width was its confidence interval, provided the average at the bottom of the plot [Antiplatelet Trialists’ Collaboration 1988].

Systematic reviews as we know them today

In the month before Glass used the term ’meta-analysis’ at the American Educational Research Association meeting, Shaikh et al. (1976) published what the title of their article called a “A systematic review of the literature on evaluative studies on tonsillectomy and adenoidectomy”. They outline their purpose as being:

“to review the English language literature pertaining to evaluation of [tonsillectomy and adenoidectomy] with a particular emphasis on an assessment of the scientific merit of studies which have attempted to determine the efficacy of this procedure.”

A total of 28 reports describing 29 studies of tonsillectomy and adenoidectomy published between 1922 and 1970 were appraised and analysed, and the assessments of each study were presented in a table. This work reflects James Lind’s intentions to separate the good from the bad, and to identify or overcome bias. This objective is distinct from the use of mathematical techniques to increase statistical power and decrease the effects of chance. Thus Shaikh et al. provide a table showing how studies done by ear, nose and throat specialists were much more favourable to tonsillectomy and adenoidectomy (12 in favour, 0 against) than those done by public health or paediatric specialists (9 in favour, 8 against). In common with many of the challenges of the 2010s, Shaikh et al. conclude their review by calling for a well-conducted randomised trial to resolve the uncertainty [Clarke et al. 2007] and highlight how evidence of effectiveness is a key element in managing health care costs [Garner et al. 2013]:

“Aside from the high cost and lack of clear cut evidence of therapeutic efficacy, there is morbidity and mortality associated with tonsillectomy and adenoidectomy. … In view of the cost, financial and human, as well as the lack of evidence clearly supporting the continued performance of this procedure, it is suggested that a prospective, properly randomized controlled study be undertaken and that the methodologic pitfalls annotated in our review be guarded against. … In this era of escalating health care costs, society can only afford therapies which have been demonstrated to be of benefit.” [Shaikh et al. 1976]

This type of conclusion also serves to highlight the importance of doing reviews to provide the ethical, scientific and economic and environmental justification when considering doing additional trials [Clarke 2004; MacLeod et al. 2014; Clarke et al. 2014]. This point was illustrated by Rogers and Clay in 1975, who wrote that the results of their review of the existing trials “suggest that the benefit of this drug in patients with endogenous depression who have not become institutionalized is indisputable, and that further drug-placebo trials in this condition are not justified”. Similarly, Baum et al. concluded in 1981 that a no-treatment control group should no longer be used in trials of antibiotic prophylaxis in colon surgery. A couple of other notable examples of evidence synthesis from the 1970s, which also cast doubt on the effects of interventions which may have looked promising when emphasis was given to the results of single studies are the work of Jan Stjernswärd (Stjernswärd 2013) and Thomas Chalmers (Dickersin and Chalmers 2014). In 1974, Stjernswärd pooled the 5-year survival results for five trials of postoperative radiotherapy for breast cancer, and concluded:

“The routine use of postoperative irradiation in early breast cancer must be seriously questioned. Survival data argue against its use, despite the local effect on recurrence rates. If the routine use of prophylactic local radiotherapy after radical mastectomy were stopped, survival might increase and resources might be saved.” [Stjernswärd 1974]

In 1975, Chalmers brought together 14 trials of ascorbic acid for the common cold, and combined the results from eight of them:

“These are minor and insignificant differences, but in most studies the severity of symptoms was significantly worse in the patients who received the placebo. … All differences in severity and duration were eliminated by analyzing only the data from those who did not know which drug they were taking. Since there are no data on the long-term toxicity of ascorbic acid when given in doses of 1g or more per day, it is concluded that the minor benefits of questionable validity are not worth the potential risk, no matter how small that might be.” [Chalmers 1975]

Collaboration and the 1980s

The following example from the start of the 1970s introduces the concept of the collaborative overview, in which researchers share their data. This need for researchers to collaborate together to ensure progress and reduce waste [Macleod et al. 2014] had been highlighted in the 1950s by Kety [Kety 1959]. This approach to research synthesis became more common during the following decade. In 1970, in an early example of an individual participant data meta-analysis [Clarke and Godwin 1998], the International Anticoagulant Review Group collected centrally and analysed original records for nearly 2500 patients from nine of ten identified trials to assess the effects of anticoagulant therapies after myocardial infarction. They wrote:

“Although we recognised that the best solution would be a new collaborative controlled trial in a large number of patients, we decided that this was, at that time, quite impracticable. As a potentially useful and simple alternative we agreed on a systematic review of the data on individual patients pooled from all the adequately controlled trials that had been published recently.” [International Anticoagulant Review Group 1970]

Such collaborative efforts became a feature of some large scale reviews in the 1980s, in particular in other areas of cardiovascular medicine and cancer [Stewart and Clarke 1995; Clarke et al. 1998]. For instance, in October 1984, people responsible for randomised trials of tamoxifen or chemotherapy for the treatment of women with breast cancer met at Heathrow Airport in London to share findings and conduct a meta-analysis of the aggregate results [Anon 1984]. They became the founders of the Early Breast Cancer Trialists’ Collaborative Group (EBCTCG) [EBCTCG 1990; Darby 2005]. In a short report in the Lancet, it was noted that:

“Since the future treatment of many women might be importantly affected by this – or a further – overview of all available trials those meeting agreed to explore the possibility of extending their collaboration to include the central review of individual patient data.” [Anon 1984]

Since then, the EBCTCG has conducted periodic reviews of the accumulating data from randomised trials of many aspects of the treatment of women with operable breast cancer, bringing further follow-up and additional trials into each cycle [EBCTCG 1988; EBCTCG 2012]. The EBCTCG was recently used as an example of the successful sharing of participant-level data from clinical trials [Varnai et al. 2014].

The spirit of collaboration to resolve uncertainties in health care in the 1980s extended beyond the establishment of groups of researchers willing to share individual participant data for collaborative meta-analyses. A notable example is the considerable international collaboration that led to the preparation of a large collections of systematic reviews of controlled trials relevant to perinatal care [Chalmers et al.1989; Sinclair and Bracken 1992], and the use of electronic media to update and correct the reviews when necessary [Chalmers 1988] Looking back two decades later, Daniel Fox wrote:

“The influence … on policy was mainly a result of … powerful blending of the rhetoric of scientific and polemical discourse, especially but not exclusively in ECPC; a growing constituency for systematic reviews as a source of ‘evidence-based’ health care among clinicians, journalists, and consumers in many countries; and recognition by significant policymakers who allocate resources to and within the health sector that systematic reviews could contribute to making health care more effective and to containing the growth of costs.” [Fox 2011]

Cochrane Collaboration

Towards the end of the 1970s, in what might be considered to be a rallying call for evidence synthesis (Chalmers 2006), Archie Cochrane had written:

“It is surely a great criticism of our profession that we have not organised a critical summary, by speciality or subspeciality, adapted periodically, of all relevant randomised controlled trials” [Cochrane 1979].

At the end of the following decade, he used the phrase “systematic review” in the foreword to the afore-mentioned compilation of evidence syntheses of maternity care interventions:

“The systematic review of the randomized trials of obstetric practice that is presented in this book is a new achievement. It represents a real milestone in the history of randomized trials and in the evaluation of care, and I hope that it will be widely copied by other medical specialties.” [Cochrane 1989]

Four years later, the international Cochrane Collaboration was established, following the opening of the first Cochrane Centre in Oxford, UK in 1992 [Chalmers et al. 1992]. The Cochrane Collaboration set itself the aim of helping people make well-informed decisions about healthcare by preparing, maintaining and promoting the accessibility of systematic reviews of the effects of healthcare interventions. It established an international infrastructure to support the production of systematic reviews across all areas of health care, with networks of individuals working together to prepare these reviews and keep them up to date. The advent of electronic publishing, which, at that time, meant publishing the material on floppy disks or CD-ROMs, allowed the full collection of systematic reviews to be provided to users on a regular basis, with the addition of new reviews and the updating of existing ones to take account of new evidence.

In 1995, the Collaboration’s publishing partner, Update Software released the first issue of the Cochrane Database of Systematic Reviews [Starr et al. 2009]. From 50 full Cochrane Reviews in that first year, the number has grown to more than 6000 in 2015. The history of evidence synthesis took another major step in 1998, when the Database went onto the internet and, now, in its partnership with Wiley-Blackwell, the Collaboration publishes the full collection of reviews in the Cochrane Library online, with new and updated reviews appearing every few hours, rather than in quarterly or monthly bundles ( The Collaboration itself has also grown considerably, from 77 people at the first Cochrane Colloquium in October 1993 to more than 30,000 in more than 100 countries ( [Allen 2011].


Although the Cochrane Collaboration remains the world’s largest single producer of systematic reviews, its output now accounts for only a small minority of the global output of evidence syntheses. In 2007, Moher et al. estimated that Cochrane Reviews made up approximately 500 of the 2500 systematic reviews published each year [Moher et al. 2007]. More recently, Bastian et al. used a variety of search strategies to show how steady growth in the number of evidence syntheses from the 1990s had transformed into a surge in recent years [Bastian et al. 2010].  Their graph

Clarke 2015 - Bastian et al 2010 Graph

clearly shows this and it is important to note that what, at first sight, might look like a cumulative count of the number of systematic reviews found by the different types of search is actually the count for articles published in each single year, showing that, for non-Cochrane reviews in particular, each year saw more publications than the previous year. They estimated that 4000 reviews were being published annually by 2010 and predicted that this would continue to grow. This has been the case, and a search of PubMed in April 2015 finds 6313 articles published in 2014, using the Publication Type term meta-analysis. There are many more to come. For example, the international, prospective register of systematic reviews, PROSPERO, established in 2011 [Booth et al. 2013] is likely to surpass 10,000 records by the end of 2015 (

When the present and future have become history

I conclude by thinking forward to the next century. How will evidence synthesis in our current decades be viewed? What will be regarded as pivotal moments, step changes, or gradual evolution? Some candidates that historians of the future might look to are:

  • the increased use of prospective registries of trials to make it easier to find what trials have been done [Ghersi and Pang 2009]
  • increased automation of the systematic review process [Adams et al. 2013]
  • development of techniques for rapid reviews [Tricco et al. 2015; Marshall et al. 2019]
  • greater access to the data from clinical trials [Varnai et al. 2014] and its use in individual participant data meta-analyses [Stewart et al. 2015]
  • greater use of material submitted to drug regulators [Jefferson et al. 2014]
  • the use of new statistical techniques such as network meta-analyses [Lee 2014; Hutton et al. 2015]
  • use of meta-epidemiology to improve the design and conduct of new studies [Savović et al. 2012]
  • use of systematic reviews of animal research to inform research in humans [Sena et al. 2010; Sena et al. 2014]
  • improvements in ways to summarise reviews and make them more accessible [Guyatt et al. 2011; Murthy et al. 2012; Maguire and Clarke 2014]
  • the use of core outcome sets [Gargon et al. 2014]
  • the conduct of empirical research into the methods for doing research and reviews of these studies [Anon 2012]
  • and, perhaps most importantly, even more recognition of the need for and benefits of systematic reviews as a way to justify and interpret new trials, and reduce waste [Macleod et al. 2014]

The past hundred or more years have seen several developments in the science and practice of evidence synthesis. The last 20-30 years have seen important step changes in the numbers of these syntheses, and in the techniques to prepare and maintain them. The underpinning scientific rationale continues to resonate with the words of Lord Rayleigh [Rayleigh 1885]. The practical benefits of making it easier for people to make well-informed decisions and choices mean that Gould’s vision of much improved access to knowledge [Gould 1898] and Kety’s hope for greater collaboration among researchers may have been achieved [Kety 1959].

This James Lind Library article has been republished in the Journal of the Royal Society of Medicine 2016;109:154-163 Print PDF


Adams CE, Polzmacher S, Wolff A (2013). Systematic reviews: Work that needs to be done and not to be done. Journal of Evidence-Based Medicine 6: 232–235.

Allen C, Richmond K (2011). The Cochrane Collaboration: International activity within Cochrane Review Groups in the first decade of the twenty-first century. Journal of Evidence-Based Medicine 4: 2–7.

Andrews G, MacMahon SW, Austin A, Byrne DG (1982). Hypertension: comparison of drug and non-drug treatments. BMJ 284:1523-1526.

Anon (1867). Reviews and notices. BMJ 2: 425-426.

Anon (1984). Review of mortality results in randomized trials in early breast cancer. Lancet ii: 1205.

Anon (2012). Education section – studies within a review (SWAR). Journal of Evidence-Based Medicine 5(3): 188-189.

Antiplatelet Trialists’ Collaboration (1988). Secondary prevention of vascular disease by prolonged anti-platelet treatment. BMJ 296:320-331.

Bastian H, Glasziou P, Chalmers I (2010).Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Medicine 7(9): e1000326.

Baum ML, Anish DS, Chalmers TC, Sacks HS, Smith H, Fagerstrom RM (1981).
A survey of clinical trials of antibiotic prophylaxis in colon surgery: Evidence against further use of no-treatment controls. New England Journal of Medicine 305: 795-799.

Booth A, Clarke M, Dooley G, Ghersi D, Moher D, Petticrew M, Stewart L (2013). PROSPERO at one year: an evaluation of its utility. Systematic Reviews 2(1): 4.

Centre for Reviews and Dissemination (2009). Systematic reviews: CRD’s guidance for undertaking reviews in health care. University of York.

Chalmers I, ed (1988). The Oxford Database of Perinatal Trials. Oxford: Oxford University Press.

Chalmers I (2006). Archie Cochrane (1909-1988). JLL Bulletin: Commentaries on the history of treatment evaluation (

Chalmers I, Dickersin K, Chalmers TC (1992). Getting to grips with Archie Cochrane’s agenda. BMJ 305(6857): 786-788.

Chalmers I, Enkin M, Keirse MJNC (1989). Effective care in pregnancy and childbirth. Oxford: Oxford University Press.

Chalmers I, Hedges LV, Cooper H (2002). A brief history of research synthesis. Evaluation and the Health Professions 25: 12-37.

Chalmers TC (1975). Effects of ascorbic acid on the common cold. An evaluation of the evidence. American Journal of Medicine 58(4): 532-536.

Clarke L, Clarke M, Clarke T (2007). How useful are Cochrane reviews in identifying research needs? Journal of Health Services Research and Policy 12: 101-103.

Clarke M (2004). Doing new research? Don’t forget the old: nobody should do a trial without reviewing what is known. PLoS Medicine 1: 100-102.

Clarke M, Brice A, Chalmers I (2014). Accumulating research: a systematic account of how cumulative meta-analyses would have provided knowledge, improved health, reduced harm and saved resources. PLoS ONE 9(7): e102670.

Clarke M, Godwin J (1998). Systematic reviews using individual patient data: a map for the minefields? Annals of Oncology 9(8): 827-833.

Clarke M, Stewart L, Pignon J-P, Bijnens L (1998). Individual patient data meta-analyses in cancer. British Journal of Cancer 77:2036-2044.

Cochrane AL (1979). 1931-1971: a critical review, with particular reference to the medical profession. In: Medicines for the year 2000. London: Office of Health Economics, pp.1-11.

Cochrane AL (1989). Foreword. In: Chalmers I, Enkin M, Keirse MJNC (editors) (1989). Effective care in pregnancy and childbirth. Oxford: Oxford University Press.

Cooper H, Hedges LV (editors) (1994). The Handbook of Research Synthesis. New York: Russell Sage Foundation.

Darby S, Davies C, McGale P (2005). The Early Breast Cancer Trialists’ Collaborative Group: a brief history of results to date. In Davison AC, Dodge Y, Wermuth N (editors). Celebrating statistics. Oxford University Press, Oxford, pp.185-198.

Dickersin K (2015). Innovation and cross-fertilization in systematic reviews and meta-analysis: The influence of women investigators. Research Synthesis Methods (Online publication ahead of print 10 June 2015).

Dickersin K, Chalmers F (2014). Thomas C Chalmers (1917-1995): a pioneer of randomized clinical trials and systematic reviews. JLL Bulletin: Commentaries on the history of treatment evaluation (

Dickersin K, Scherer R, Lefebvre C (1994). Identifying relevant studies for systematic reviews. BMJ 309: 1286-1291.

Early Breast Cancer Trialists’ Collaborative Group (1988). Effects of adjuvant tamoxifen and of cytotoxic therapy on mortality in early breast cancer. An overview of 61 randomized trials among 28,896 women. New England Journal of Medicine 319: 1681-1692.

Early Breast Cancer Trialists’ Collaborative Group (1990). Treatment of early breast cancer. Vol 1. Worldwide evidence, 1985-1990. Oxford: Oxford University Press.

Early Breast Cancer Trialists’ Collaborative Group (2012). Comparisons between different polychemotherapy regimens for early breast cancer: meta-analyses of long-term outcome among 100,000 women in 123 randomised trials. Lancet 379(9814); 432-444.

Feldman KA (1971). Using the work of others: some observations on reviewing and integrating. Sociology of Education 44:86-102.

Fox DM (2011). Systematic reviews and health policy: the influence of a project on perinatal care since 1988. Milbank Quarterly 89: 425-449.

Freiman JA, Chalmers TC, Smith H, Kuebler RR (1978). The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 “negative” trials. New England Journal of Medicine 299: 690-694.

Garfield E (1977). Proposal for a new profession: scientific reviewer. Essays of an Information Scientist 3: 84-87.

Garfield E (1979). The NAS James Murray Luck Award for excellence in scientific reviewing. Essays of an Information Scientist 4: 127-131.

Gargon E, Gurung B, Medley N, Altman DG, Blazeby JM, Clarke M, Williamson PR (2014). Choosing important health outcomes for comparative effectiveness research: a systematic review. PLoS ONE 9(6): e99111.

Garner S, Docherty M, Somner J, Sharma T, Choudhury M, Clarke M, Littlejohns P (2013). Reducing ineffective practice: challenges in identifying low-value health care using Cochrane systematic reviews. Journal of Health Services Research and Policy 18: 6-12.

Ghersi D, Pang T (2009). From Mexico to Mali: four years in the history of clinical trial registration. Journal of Evidence-Based Medicine 2(1): 1-7.

Glass GV (1976). Primary, secondary and meta-analysis of research. Educational Researcher 10: 3-8.

Goldschmidt PG (1986). Information synthesis: a practical guide. HSR: Health Services Research 21:215-236.

Gould GM (1898). The work of an association of medical librarians. Journal of Medical Librarians Association 1: 15–19.

Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, Norris S, Falck-Ytter Y, Glasziou P, DeBeer H, Jaeschke R, Rind D, Meerpohl J, Dahm P, Schünemann HJ (2011). GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. Journal of Clinical Epidemiology 64(4): 383-394

Higgins JPT, Green S (editors) (2008). Cochrane Handbook for Systematic Reviews of Interventions. Oxford, The Cochrane Collaboration. John Wiley and Sons Ltd.

Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, Ioannidis JP, Straus S, Thorlund K, Jansen JP, Mulrow C, Catalá-López F, Gøtzsche PC, Dickersin K, Boutron I, Altman DG, Moher D (2015). The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Annals of Internal Medicine 162(11): 777-784.

International Anticoagulant Review Group (1970). Collaborative analysis of long-term anti-coagulant administration after acute myocardial infarction. Lancet 1:203-209.

Jefferson T, Jones MA, Doshi P, Del Mar CB, Hama R, Thompson MJ, Spencer EA, Onakpoya I, Mahtani KR, Nunan D, Howick J, Heneghan CJ (2014). Neuraminidase inhibitors for preventing and treating influenza in healthy adults and children.
Cochrane Database of Systematic Reviews (4): CD008965.

Kass EH (1981). Reviewing reviews. In: Warren KS, ed. Coping with the biomedical literature. New York: Praeger, pp 79-91.

Kety S (1959). Comment. In: Cole JO, Gerard RW (editors). Psychopharmacology. Problems in Evaluation. National Academy of Sciences, Publication 583, Washington DC, pp 651-652.

Lee AW (2014). Review of mixed treatment comparisons in published systematic reviews shows marked increase since 2009. Journal of Clinical Epidemiology 67(2): 138-143.

Lefebvre C, Eisinga A, McDonald S, Paul N (2008). Enhancing access to reports of randomized trials published world-wide – the contribution of EMBASE records to the Cochrane Central Register of Controlled Trials (CENTRAL) in The Cochrane Library. Emerging Themes in Epidemiology 5: 13.

Lewis JA (1982). Beta-blockade after myocardial infarction- a statistical view. British Journal of Clinical Pharmacology 14:15S-21S.

Lewis S, Clarke M (2001). Forest plots: trying to see the wood and the trees. BMJ 322: 1479-1480.

Light RJ, Smith PV (1971). Accumulating evidence: Procedures for resolving contradictions among research studies. Harvard Educational Review 41:429-471.

Lind J (1753). A treatise of the scurvy. In three parts. Containing an inquiry into the nature, causes and cure, of that disease. Together with a critical and chronological view of what has been published on the subject. Edinburgh: Printed by Sands, Murray and Cochran for A Kincaid and A Donaldson.

Macleod MR, Michie S, Roberts I, Dirnagl U, Chalmers I, Ioannidis JP, Al-Shahi Salman R, Chan AW, Glasziou P (2014). Biomedical research: increasing value, reducing waste. Lancet 383(9912): 101-104.

Maguire LK, Clarke M (2014). How much do you need: a randomised experiment of whether readers can understand the key messages from summaries of Cochrane Reviews without reading the full review. Journal of the Royal Society of Medicine 107(11): 444-449.

Moher D, Tetzlaff J, Tricco AC, Sampson M, Altman DG (2007). Epidemiology and reporting characteristics of systematic reviews. PLoS Medicine 4(3): e78.

Mulrow CD (1987). The medical review article: state of the science. Annals of Internal Medicine 106: 485-488.

Mulrow CD (1998). We’ve come a long way, baby! The Cochrane Collaboration Methods Working Groups Newsletter. 1-2.

Murthy L, Shepperd S, Clarke MJ, Garner SE, Lavis JN, Perrier L, Roberts NW, Straus SE (2012). Interventions to improve the use of systematic reviews in decision-making by health system managers, policy makers and clinicians. Cochrane Database of Systematic Reviews (9): CD009401.

O’Rourke K (2006). A historical perspective on meta-analysis: dealing quantitatively with varying study results. JLL Bulletin: Commentaries on the history of treatment evaluation (

Pearson K (1904). Report on certain enteric fever inoculation statistics. British Medical Journal 3: 1243-1246.

Pettiti DB (1994). Meta-analysis, decision analysis, and cost-effectiveness analysis: Methods for quantitative synthesis in medicine. New York: Oxford University Press.

Peto R, Pike MC, Armitage P, Breslow NE, Cox DR, Howard SV, Mantel N, McPherson K, Peto J, Smith PG (1977). Design and analysis of randomized clinical trials requiring prolonged observation of each patient. II. Analysis and examples. British Journal of Cancer 35(1):1-39.

Pratt JG, Rhine JB, Smith BM, Stuart CE, Greenwood JA (1940). Extra-sensory perception after sixty years: a critical appraisal of the research in extra-sensory perception. New York: Henry Holt.

Lord Rayleigh (1885). Address by the Rt. Hon. Lord Rayleigh. In: Report of the fifty-fourth meeting of the British Association for the Advancement of Science; August and September; Montreal, Canada. London: John Murray.

Rogers SC, Clay PM (1975). A statistical review of controlled trials of imipramine and placebo in the treatment of depressive illnesses. British Journal of Psychiatry 127: 599-603.

Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC (1987). Meta-analysis of randomized controlled trials. New England Journal of Medicine 316: 450-455.

Savović J, Jones HE, Altman DG, Harris RJ, Jüni P, Pildal J, Als-Nielsen B, Balk EM, Gluud C, Gluud LL, Ioannidis JPA, Schulz KF, Beynon R, Welton NJ, Wood L, Moher D, Deeks JJ, Sterne JAC (2012). Influence of reported study design characteristics on intervention effect estimates from randomized controlled trials. Annals of Internal Medicine 157: 429-438.

Sena ES, Briscoe CL, Howells DW, Donnan GA, Sandercock PA, Macleod MR (2010). Factors affecting the apparent efficacy and safety of tissue plasminogen activator in thrombotic occlusion models of stroke: systematic review and meta-analysis. Journal of Cerebral Blood Flow and Metabolism 30: 1905-1913.

Sena ES, Currie GL, McCann SK, Macleod MR, Howells DW (2014). Systematic reviews and meta-analysis of preclinical studies: why perform them and how to appraise them critically. Journal of Cerebral Blood Flow and Metabolism 34(5):737-42

Shadish WR, Lecy JD (2015). The meta-analytic big bang. Research Synthesis Methods (Online publication ahead of print 10 June 2015).

Shaikh W, Vayda E, Feldman W (1976). A systematic review of the literature on evaluative studies on tonsillectomy and adenoidectomy. Pediatrics 57(3): 401-407

Sinclair JC, Bracken MB (1992). Effective care of the newborn infant. Oxford: Oxford University Press.

Smith A, Traganza E, Harrison G (1969). Studies on the effectiveness of antidepressant drugs. Psychopharmacology Bulletin (suppl) 1-53.

Smith ML, Glass GV (1977). Meta-analysis of psychotherapy outcome studies. American Psychologist 32:752-760.

Starr M, Chalmers I, Clarke M, Oxman AD (2009). The origins, evolution, and future of The Cochrane Database of Systematic Reviews. International Journal of Technology Assessment in Health Care 25(suppl 1): 182-195.

Stewart L, Clarke M, for the Cochrane Collaboration Working Group on meta-analyses using individual patient data (1995). Practical methodology of meta-analyses (overviews) using updated individual patient data. Statistics in Medicine 14: 2057-2079.

Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, Tierney JF; PRISMA-IPD Development Group (2015). Preferred Reporting Items for Systematic Review and Meta-Analyses of individual participant data: the PRISMA-IPD Statement. JAMA 313(16): 1657-1665.

Stjernswärd J (1974). Decreased survival related to irradiation postoperatively in early breast cancer. Lancet 304: 1285-1286.

Stjernswärd J (2013). Personal reflections on contributions to pain relief, palliative care and global cancer control JLL Bulletin: Commentaries on the history of treatment evaluation (

Varnai P, Rentel MC, Simmonds P, Sharp TA, Mostert B and de Jongh T. Assessing the research potential of access to clinical trial data. A report to the Wellcome Trust. Brighton: technopolis group, 2015. (

Wechsler H, Grosser GH, Greenblatt M (1965). Research evaluating antidepressant medications on hospitalized mental patients: A survey of published reports during a five-year period. The Journal of Nervous and Mental Disease 141(2): 231-239.