Altman DG†, Simera I (2015). A history of the evolution of guidelines for reporting medical research: the long road to the EQUATOR Network.

© Iveta Simera, Centre for Statistics in Medicine, University of Oxford, Botnar Research Centre, Windmill Road, Oxford OX3 7LD, UK. email: Iveta.Simera@csm.ox.ac.uk.


Cite as: Altman DG†, Simera I (2015). A history of the evolution of guidelines for reporting medical research: the long road to the EQUATOR Network. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/a-history-of-the-evolution-of-guidelines-for-reporting-medical-research-the-long-road-to-the-equator-network/)


Introduction

Testing medical treatments and other interventions aimed at improving people’s health is incredibly important. However, comparative studies need to be well designed, well conducted, appropriately analysed and responsibly interpreted. Sadly, not all available findings and ‘discoveries’ are based on reliable research.

Our beliefs about best practices for medical research developed massively over the 20th century and ideas and methods continue to evolve. Much, perhaps most, medical research is done by individuals for whom it is not their main sphere of activity; notably, clinicians are expected to conduct some research early in their careers. As such, it is perhaps not surprising that there have been consistent comments on the poor quality of research and also recurrent attempts to raise understanding of how to do research well.

More recently, and increasingly over the last twenty years, concerns about poor methodology (Altman 1994) have been augmented by growing concerns about the inadequacy of reporting in published journal articles (Simera and Altman 2009).

Evaluating the quality of medical research

From at least as early as the first part of the 20th century, there have been publications referring disparagingly to the quality of research methods and inadequate understanding of research methodology, as judged by comparisons with prevailing standards (see Box 1).

Box 1. Early comments about poor research methodology
“The quality of the published papers is a fair reflection of the deficiencies of what is still the common type of clinical evidence. A little thought suffices to show that the greater part cannot be taken as serious evidence at all.” (Sollmann 1917)“
It is a commonplace of medical literature to find the most positive and sweeping conclusions drawn from a sample so meager as to make scientifically sound conclusions of any sort utterly impossible.” (Pearl 1919)
“Statistical workers who fail to scrutinize the goodness of their observed data and carry through a satisfactory analysis upon poor observations, will end up with ridiculous conclusions which cannot be maintained.” (Dunn 1929)
“Medical papers now frequently contain statistical analyses, and sometimes these analyses are correct, but the writers violate quite as often as before, the fundamental principles of statistical or of general logical reasoning … The writer, who 20 years ago would have said that statistical method was mere “mathematical” juggling having no relation to “practical” matters, now seeks for some technical “formula” by the application of which he can produce “significant” results … the change has been from thinking badly to not thinking at all.” (Greenwood, 1932)
“My own survey was not numerical and was concerned more with clinical than with laboratory medicine, but it revealed that the same general verdict, perhaps even a more adverse one, was appropriate in the clinical field … Frequently, indeed, the way in which the observations were planned must have made it impossible for the observer to form a valid estimate of the error … an idea of what results might be expected if the experiment were repeated under the same conditions.” (Mainland 1938)“
… less than 1% of research workers clearly apprehend the rationale of the statistical techniques they commonly invoke.” (Hogben 1950)
“…almost any volume of a medical journal contains faults that can be detected by first-year students after only three or four hours’ guidance in the scrutiny of reports.” (Mainland 1952)

 

Halbert Dunn was a medical doctor subsequently employed as a statistician at the Mayo Clinic. He is probably best known for introducing, many years later, the concept of “wellness” (Dunn 1959). Dunn may have been the first person to publish the findings of a review of an explicit sample of journal publications (Dunn 1929). His unfortunately brief summary of his observations about the 200 articles he examined was as follows:

In order to gain some knowledge of the degree to which statistical logic is being used, a survey was made of a sample of 200 medical-physiological quantitative papers from current American periodicals. Here is the result:

  1. In over 90 per cent statistical methods were necessary and not used.
  2. In about 85 per cent considerable force could have been added to the argument if the probable error concept had been employed in one form or another.
  3. In almost 40 per cent conclusions were made which could not have been proved without setting up some adequate statistical control.
  4. About half of the papers should never have been published as they stood; either because the numbers of observations were insufficient to prove the conclusions or because more statistical analysis was essential.

Statistical methods must eventually become an essential tool for the physiologist. It will be the physiologist who uses this tool most effectively and not the statistician untrained in physiological methods.

The earliest publication providing a detailed report of the weaknesses of a body of published research articles across specialties seems to have been that by Schor and Karten, respectively a statistician and a medical student (Schor and Karten 1966). They investigated the lack of statistical planning and evaluation in published articles and presented a programme to improve publications. They examined 295 publications in 10 of the ’most frequently read’ medical journals between January and March 1964, of which 149 were analytical studies and 146 case descriptions. Their main findings for the analytical studies were:

  • 34% Conclusions drawn about population but no statistical tests applied on the sample to determine whether such conclusions were justified.
  • 31% No use of statistical tests when needed
  • 25% Design of study not appropriate for solving problem stated
  • 19% Too much confidence placed on negative results with small-size samples.

Their bottom line summary was: “Thus, in almost 73% of the reports read (those needing revision and those which should have been rejected), conclusions were drawn when the justification for these conclusions was invalid.”

Over the last 50 years occasional similar reviews have been published. It is common that the reviewers report that a high percentage of papers had methodological problems. A few examples are:

  • Among 513 behavioural, systems and cognitive neuroscience articles published in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience) in 2009-10, 50% of 157 articles which compared effect sizes used an inappropriate method of analysis (Nieuwenhuis et al 2011);
  • In 100 orthopaedic research papers published in seven journals in 2005-10, the conclusions were not clearly justified by the results in 17% and a different analysis should have been undertaken in 39% (Parsons et al 2012);
  • Of 100 consecutive papers sent for review at the journal Injury, 47 used an inappropriate analysis (Prescott and Civil 2013).

In 4-yearly surveys of a sample of clinical trials reported in five general medical journals beginning in 1997 (Clarke and Chalmers 1998), Clarke and his colleagues have drawn attention to the failure of authors and journals to ensure that the results of new trials include why the additional studies were done and what difference the results made to the accumulated evidence addressing the uncertainties in question (Clarke and Hopewell 2013).

Reporting medical research

The main focus of Schor and Karten (1966) was the use of valid methods and appropriate interpretation. Although they did not address reporting as such, any attempt to assess the appropriateness of methodology used in research runs quickly into the problem that the methods are often poorly described. For example, it is impossible to assess the extent to which bias was avoided without details of the method of allocation of trial participants to treatments.. Likewise it is impossible to use the results of a trial in clinical practice if the article doesn’t include full details of the interventions (Glasziou et al. 2008).

Concern about the completeness of research reports is a relatively recent phenomenon, strongly linked to the rise of systematic reviews. However, there are some early examples of the recognition of the importance of how research findings are communicated. The earliest such comments of which we are aware were made by the anatomist-turned-statistician Donald Mainland. In one of his earliest methodological publications Mainland commented on the importance of how numerical results were presented (Mainland 1934), and he devoted a whole chapter of his 1938 textbook to ‘Publication of data and results’ (Mainland 1938) (Box 2).

Box 2. Early comments about reporting research
“The way to a more adequate understanding and treatment of medical data would be opened up if all records, articles, and even abstracts gave, besides averages, the numbers of observations and the variation, properly expressed, e.g. as standard deviation (maxima and minima being very unreliable).” (Mainland 1934)
“… incompleteness of evidence is not merely a failure to satisfy a few highly critical readers. It not infrequently makes the data that are presented of little or no value.” (Mainland 1938)
“This leads one to consider if it is possible, in planning a trial, in reporting the results, or in assessing the published reports of trials, to apply criteria which must be satisfied if the analysis is to be entirely acceptable….A high standard must be set, however, not only in order to assess the validity of results, but also because pioneering investigations of this type may in many ways serve as a model and lesson to future investigators. A basic principle can be set up that, just as in a laboratory experiment report, it is at least as important to describe the techniques employed and the conditions in which the experiment was conducted, as to give the detailed statistical analysis of results.” (Daniels 1950)
“A clinical experiment is not completed until the results have been made available to one’s colleagues and co-workers. There is, in a sense, a moral obligation to ’give posterity’ the fruits of one’s scientific labor. Certainly it would be a sad waste of effort to allow reams of data to lie yellowing in a dusty file, while in other laboratories workers are unnecessarily repeating the study.” (Waife 1959)
“Words like “random assignment”, “blindfold technique”, “objective methods” and “statistical analysis,” are no guarantee of quality. The reader should ask: “What is the evidence that the investigator was keenly aware of what might interfere with the effects of the randomization, such as leakage in the blindfold, and pseudo-objective assessments?” “What steps did he take to prevent such risks.” (Mainland 1969)
“It is difficult enough for a clinician to interpret the statistical meaning of a procedure with which he is unfamiliar; it is much more difficult when he is not told what that procedure was.” (Feinstein 1974)
“… the idea is to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgement in one particular direction or another.” (Feynman 1974)

 

Assessing published reports of clinical trials

The earliest review we know devoted to assessment of published reports of clinical trials is that of Ross, who found that only 27 of 100 clinical trials were well-controlled, and over half were uncontrolled (Ross 1951).

Sandifer et al (1961) studied 106 reports of trials of psychiatric drugs, aiming to compare those published before or after the report of Cole et al., which had included recommendations on reporting clinical trials (Cole et al 1957). In so doing they anticipated by some decades a before-after study design that has become quite familiar (see Figure 1). Their detailed assessment included aspects of reporting of clinical details, interventions, and methodology.

Figure_1_Fig4_eval_of_change

FIGURE 1

Another early study that looked at reporting, also in psychiatry, focused on whether authors gave adequate details of the interventions being tested. Glick wrote:

Two of the 29 studies did not indicate in any manner the duration of therapy. One of these was the paper which had given no dosage data. Thus 27 studies were left wherein there was some notation of duration. However, in four of these, duration was mentioned in such a vague or ambiguous way as to be unsuitable for comparative purposes. For instance, the duration of treatment might be given as “at least two months,” or, “from one to several months.” (Glick 1963)

After these early studies there was a steady trickle of similar studies, examining the reporting of clinical trials in journal articles (DerSimonian et al. 1982; Tyson et al. 1983; Meinert et al. 1984; Liberati et al. 1986; Pocock et al. 1987; Gøtzsche 1989). Recent years have seen a vast number of such studies. Dechartres and colleagues identified 177 literature reviews published from 1987 to 2007, 58% of which were published after 2002 (Dechartres et al 2011). The rate has escalated further subsequently.

Developing the first reporting guidelines for randomised trials

The path to CONSORT
Many types of guideline are relevant for clinical trials – they might relate to study conduct, reporting, critical appraisal, or peer review. All of these could address the same important elements of trials, notably allocation to interventions, blinding, outcomes, interventions, and completeness of data. These key elements also feature strongly in assessments of study “quality” or “risk of bias”. However, an important criticism of many tools for assessing the quality of publications is that they mix considerations of methods (bias avoidance) with aspects of reporting (Jüni et al 2001).

Since the 1980s there had been occasional suggestions that it would be useful to have guidelines restricted to what should be reported (Box 3). Some of these authors suggested that medical journals should provide guidelines for authors.

Box 3. Early comments about the desirability of reporting guidelines
“Standards governing the content and format of statistical aspects should be developed to guide authors in the preparation of manuscripts.”(O’Fallon et al 1978)
“… editors could greatly improve the reporting of clinical trials by providing authors with a list of items that they expected to be strictly reported.”(DerSimonian et al 1982)
“An obvious proposal is to suggest that editors of oncology journals make up a check-list for authors of submitted clinical trial papers.” (Zelen 1989)
“Unfortunately, in recent years I have become increasingly aware of the fact that it is very difficult to publish a manuscript which has been carefully written to communicate to the reader the key decisions that were made during the progress and analysis of the study, as there are many medical journal reviewers who consider these details irrelevant. The issue here is not whether the study was performed properly – it is whether it can be reported adequately. Clearly, there is a real need for further education of medical reviewers as to the information required to evaluate medical studies effectively. And there is a corresponding need for statisticians to develop reporting strategies more acceptable to the medical community than those currently available.” (O’Fallon 1990)
“Authors should be provided with a list of items that are required. Existing check lists do not cover treatment allocation and baseline comparisons as comprehensively as we have suggested. Even if a check list is given to authors there is no guarantee that all items will be dealt with. The same list can be used editorially, but this is time-consuming and inefficient. It would be better for authors to be required to complete a check list that indicates for each item the page and paragraph where the information is supplied. This would encourage better reporting and aid editorial assessment, thus raising the quality of published clinical trials.” (Altman and Doré, 1990)

 

There were occasional early calls for better reporting of randomized control trials (see Box 2), but the few early guidelines for reports of RCTs (Grant 1989; Squires & Elmslie 1990) had very little impact. These guidelines tended to be targeted at reviewers.

A notable exception was the proposal that journal articles should have structured abstracts. First proposed in 1987 (Ad Hoc Working Group for Critical Appraisal of the Medical Literature 1987) and updated in 1990 (Haynes et al 1990), detailed guidelines were provided for abstracts of articles reporting original medical research or systematic reviews. Structured abstracts were quickly adopted by many medical journals, although they did not necessarily adhere to the detailed recommendations. There is now considerable evidence that structured abstracts communicate more effectively than traditional ones (Hartley 2014).

Serious attempts to develop guidelines relating to the reporting of complete research articles and targeted at authors began in the 1990s. In December 1994 two publications in leading general medical journals presented independently-developed guidelines for reporting randomised controlled trials: “Asilomar” (Asilomar Working Group on Recommendations for Reporting of Clinical Trials in the Biomedical Literature 1994) and SORT (The Standards of Reporting Trials Group 1994). Each had arisen from a meeting of researchers and editors concerned to try to improve the standard of reporting trials. Although the two checklists had overlapping content there were some notable differences in emphasis.

Of particular interest, the SORT recommendation arose from a meeting initially intended to develop a new scale to address the quality of RCT methodology, a key element of the conduct of a systematic reviews. Early in the meeting Tom Chalmers (Dickersin and Chalmers 2014. JLL) argued that poor reporting of research was a major problem that undermined the assessment of published articles, so the meeting was redirected towards developing recommendations for reporting RCTs ( Standards of Reporting Trials Group 1994).

The CONSORT Statement
Following the publication of the SORT and Asilomar recommendations, Drummond Rennie, Deputy Editor of JAMA, suggested that the SORT and Asilomar proposals should be combined into a single, coherent set of evidence-based recommendations (Rennie 1995). To that end, representatives from both groups met in Chicago in 1996 and produced the CONsolidated Standards Of Reporting Trials (CONSORT) Statement, published in 1996 (Begg et al 1996). The CONSORT Statement comprised a checklist and flow diagram for reporting the results of RCTs.

The rationale for including items in the checklist was that they are all necessary to evaluate a trial – readers need this information to be able to judge the reliability and relevance of the findings. Whenever possible, decisions to include items was based on relevant empirical evidence.. The CONSORT recommendations were updated in 2001, and published simultaneously in three leading general medical journals (Moher et al 2001). At the same time, a long “Explanation and Elaboration” (E&E) paper was published, which included detailed explanations of the rationale for each of the checklist items, examples of good reporting, and a summary of evidence about how well (or poorly) that information was reported in published reports of RCTs (Altman et al. 2001). Both the checklist and the E&E paper were updated again in 2010 in the light of new evidence (Moher et al 2010a; Schulz et al 2010).

The checklist is seen as the minimum set of information, indicating information that is needed for all randomized trials. Clearly any important information about the trial should be reported, whether or not it is specifically addressed in the checklist. The flow diagram shows the passage of trial participants through the trial, from recruitment to final analysis (Figure 2). Although rare earlier examples exist (Yelle et al. 1991), few reports of RCTs included flow diagrams prior to 1996. The flow diagram has become the most widely adopted of the CONSORT recommendations, although published diagrams often fail to include all the items recommended by CONSORT (Hopewell et al 2011).

Figure_2_CONSORT_2010_Flow_Diagram3

FIGURE 2

The 2001 update of CONSORT clarified that the main focus was on two-arm parallel group trials (Moher et al 2001). The first published CONSORT extensions addressed reporting of cluster randomised trials (Campbell et al 2012) and non-inferiority and equivalence trials (Piaggio et al 2012). Both have been updated to take account of the changes in CONSORT 2010. A recent extension addressed N-of-1 trials (Vohra et al 2015). Design-specific extensions to the CONSORT checklist led to modification of some checklist items, and the addition of some new elements to the checklist. Some also require modification of the flow diagram.

Two further extensions of CONSORT are relevant to almost all trial reports. They relate to the reporting of harms (Ioannidis et al 2004) and the content of abstracts (Hopewell et al 2008a; Hopewell et al 2008b).

The influence of CONSORT on other reporting guidelines
The CONSORT Statement proved to be a very influential guideline that impacted not only on the way we report clinical trials but also on the development of many other reporting guidelines. Factors in the success of CONSORT include:

  • Membership of CONSORT group includes methodologists, trialists, and journal editors
  • Concentration on reporting rather than study conduct
  • Recommendations based on evidence where possible
  • Focus on the essential issues (i.e. the minimum set of information to report)
  • High profile publications
  • Support from major editorial groups, hundreds of medical journals, and some funding agencies
  • Dedicated executive group that co-ordinated ongoing developments and promotion
  • Updated to take account of new evidence and latest thinking

The CONSORT approach has been adopted by several other groups. Indeed the QUOROM Statement (reporting recommendations for reporting meta-analyses of RCTs) was developed after a meeting held in October 1996 (Moher et al. 1999), only a few months after the initial CONSORT Statement was published. The meeting to develop MOOSE (for reporting meta-analyses of observational studies) was held in April 1997 (Stroup et al. 2000). Several guidelines groups have followed CONSORT by producing detailed E&E papers to accompany a new reporting guideline, including STARD (Bossuyt et al 2003), STROBE (Vandenbroucke et al. 2007), PRISMA (Liberati et al. 2009), REMARK (Altman et al. 2012), and TRIPOD (Moons et al. 2015).

CONSORT has also been the basis for guidelines for non-medical experimental studies, such as ARRIVE for in vivo experiments using animals (Kilkenny et al. 2010), REFLECT for research on livestock (Sargeant et al. 2010) and guidelines for software engineering (Jedlitschka and Pfahl 2005).

The importance of implementation

Initial years of the EQUATOR Network
Reporting guidelines are important for achieving high standards in reporting health research studies. They specify minimum information needed for a complete and clear account of what was done and what was found during a particular kind of research study so the study can be fully understood, replicated, assessed and the findings used. Reporting guidelines focus on scientific content and thus complement journals’ instructions to authors, which mostly deal with the technicalities of submitted manuscripts. The primary role of reporting guidelines is to remind researchers what information to include in the manuscript, not to tell them how to do research. In a similar way they can be an efficient tool for peer reviewers to check the completeness of information in the manuscript. Judgements of completeness are not arbitrary: they relate closely to the reliability and usability of the findings presented in a report.

Potential users of research, for example systematic reviewers, clinical guideline developers, clinicians, and sometimes patients, have to assess two key issues: the methodological soundness of the research (how well the study was designed and conducted) and its clinical relevance (how the study population relates to a specific population or patient, what the intervention was and how to use it successfully in practice, what the side effects encountered were, etc.). The key goal of a good reporting guideline is to help authors to ensure all necessary information is described sufficiently in a report of research.

Although CONSORT and other reporting guidelines started to influence the way research studies were reported, the documented improvement in adherence to these guidelines remained unacceptably low (Pocock et al 2004; Plint et al 2006). To have a meaningful impact on preventing poor reporting, guidelines needed to be widely known and routinely used during the research publication process.

In 2006, one of us (DGA) obtained a 1-year seed grant from the UK NHS National Knowledge Service (led by Muir Gray) to establish a programme to improve the quality of medical research reports available to UK clinicians through wider use of reporting guidelines. The initial project had three major objectives: (i) to map the current status of all activities aimed at preparing and disseminating guidelines on reporting health research studies; (ii) to identify key individuals working in the area; and (iii) to establish relationships with potential key stakeholders. We (DGA and IS) established a small working group with David Moher, Kenneth Schulz and John Hoey, and laid the foundations of the new programme, which we named EQUATOR (Enhancing the QUAlity and Transparency Of health Research).

EQUATOR was the first coordinated attempt to tackle the problems of inadequate reporting systematically and on a global scale. The aim was to create an ‘umbrella’ organisation that would bring together researchers, medical journal editors, peer reviewers, developers of reporting guidelines, research funding bodies and other collaborators with a mutual interest in improving the quality of research publications and of research itself. This philosophy led to the programme’s name change into ‘The EQUATOR Network’ (www.equator-network.org).

The EQUATOR Network held its first international working meeting in Oxford in May-June 2006. The 27 participants from 10 countries included representatives of reporting guideline development groups, journal editors, peer reviewers, medical writers, and research funders. The objective of the meeting was to exchange experience in developing, using and implementing reporting guidelines and to outline priorities for future EQUATOR Network activities. Prior to that first EQUATOR meeting we had identified published reporting guidelines and had surveyed their authors to document how the guidelines had been developed and what problems had been encountered during their development (Simera et al 2008). The survey results and meeting discussions helped us to prioritise the activities needed for a successful launch of the EQUATOR programme. These included the development of a centralised resource portal supporting good research reporting and a training programme, and support for the development of robust reporting guidelines.

The EQUATOR Network was officially launched at its inaugural meeting in London in June 2008. Since its launch, there have been a number of important milestones and a heartening impact on the promotion, uptake, and development of reporting guidelines.

The EQUATOR Library for health research reporting is a free online resource that contains an extensive database of reporting guidelines and other resources supporting the responsible publication of health research. As of September 2015, 22,000 users access these resources every month and this number continues to grow. The production of new reporting guidelines has increased considerably in recent years. The EQUATOR database of reporting guidelines currently contains 282 guidelines (accessed on 22 September 2015). The backbone comprises about ten core guidelines, each providing a generic reporting framework for a particular kind of study (for example, CONSORT for randomised trials, STROBE for observational studies, PRISMA for systematic reviews, STARD for diagnostic test accuracy studies, TRIPOD for prognostic studies, CARE for case reports, ARRIVE for animal studies, etc.). Most guidelines, however, are targeted at specific clinical areas or aspects of research.

There are differences in the way individual guidelines were developed (Simera et al 2008; Moher et al 2011). At present the EQUATOR database is inclusive and does not apply any exclusion filter based on reporting guideline development methods. However, in order to ensure the robustness of guideline recommendations and their wide acceptability it is important that guidelines are developed in ways likely to be trustworthy. Based on experience gained in developing CONSORT and several other guidelines, the EQUATOR team published recommendations for future guideline developers (Moher 2010b) and the Network supports developers in various ways.

Making all reporting guidelines known and easily available is the first step in their successful use. Promotion, education and training form another key part of the EQUATOR Network’s core programme. The EQUATOR team members give frequent presentations at meetings and conferences and organise workshops on the importance of good research reporting and reporting guidelines. EQUATOR courses target journal editors, peer reviewers, and, most importantly, researchers – authors of scientific publications. Developing skills in early stage researchers is the key to a long term change in research reporting standards. Journal editors play an important role too, not only as gatekeepers of good reporting quality but also in raising awareness of reporting shortcomings and directing authors to reliable reporting tools. A growing number of journals link to the EQUATOR resources and participate in and support EQUATOR activities.

Recent literature reviews have shown evidence of modest improvements in reporting over time for randomised trials (adherence to CONSORT) (Turner et al. 2012) and diagnostic test accuracy studies (adherence to STARD) (Korevaar et al. 2015). The present standards of reporting remain inadequate, however.

Further development of the EQUATOR Network
The EQUATOR programme is not a fixed term project but an ongoing programme of research support. The EQUATOR Network is gradually developing into a global initiative. Until 2014 most of the EQUATOR activities were carried out by the small core team based in Oxford, UK. In 2014 we launched three centres to expand EQUATOR activities: the UK EQUATOR Centre (also the EQUATOR Network’s head office), the French EQUATOR Centre, and the Canadian EQUATOR Centre. The new centres will focus on national activities aimed at raising awareness and supporting adoption of good research reporting practices. The centres work with partner organisations and initiatives as well as contributing to the work of the EQUATOR Network as a whole.

The growing number of people involved in the EQUATOR work also fosters wider involvement in and reporting of ‘research on research’. Each centre has its own research programme relating to the overall goals of EQUATOR. Research topics include reviews of time trends in the nature and quality of publications; the development of tools and strategies to improve the planning, design, conduct, management and reporting of biomedical research; investigating strategies to help journals to improve the quality of manuscripts; and so on (e.g. Hopewell et al 2014; Stevens et al 2014; Barnes et al 2015; Mahady et al 2015).

In conclusion

Concern about the quality of medical research has been expressed intermittently over a century, and quality about reports of research for almost as long. At last, in the 1990s, serious international efforts began to promote better reporting of medical research. The emergence of the EQUATOR Network has been both a result and a cause of the progress that is being made.

This James Lind Library article has been republished in the Journal of the Royal Society of Medicine 2016;109:67-77  Print PDF

References

Ad Hoc Working Group for Critical Appraisal of the Medical Literature (1987). A proposal for more informative abstracts of clinical articles. Ann Intern Med 106:598-604.

Altman DG (1994). The scandal of poor medical research. BMJ 308:283-284.

Altman DG, Doré C (1990). Randomisation and baseline comparisons in clinical trials. Lancet 335:149-153.

Altman DG, McShane LM, Sauerbrei W, Taube SE (2012). Reporting recommendations for tumor marker prognostic studies (REMARK): explanation and elaboration. BMC Med 10:51.

Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gøtzsche PC, Lang T (2001). The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 134:663-694.

Asilomar Working Group on Recommendations for Reporting of Clinical Trials in the Biomedical Literature (1994). Call for comments on a proposal to improve reporting of clinical trials in the biomedical literature. Working Group on Recommendations for Reporting of Clinical Trials in the Biomedical Literature. Ann Intern Med 121:894-895.

Barnes C, Boutron I, Giraudeau B, Porcher R, Altman DG, Ravaud P (2015). Impact of an online writing aid tool for writing a randomized trial report: the COBWEB (Consort-based WEB tool) randomized controlled trial. BMC Med 13:221.

Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, Pitkin R, Rennie D, Schulz KF, Simel D, Stroup DF (1996). Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 276:637-639.

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HC, Lijmer JG (2003). The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med 138:W1-12.

Campbell MK, Piaggio G, Elbourne DR, Altman DG, the CONSORT Group (2012). CONSORT 2010 statement: extension to cluster randomised trials. BMJ 345:e5661.

Clarke M, Chalmers I (1998). Discussion sections in reports of controlled trials published in general medical journals: islands in search of continents? JAMA 280:280-282.

Clarke M, Hopewell S (2013). Many reports of randomised trials still don’t begin or end with a systematic review of the relevant evidence. J Bahrain Med Soc 24:145-147.

Cole JO, Ross S, Bouthilet L (1957). Recommendations for reporting studies of psychiatric drugs. Public Health Reports 72:638-645.

Daniels M (1950). Scientific appraisement of new drugs in tuberculosis. Am Rev Tuberc 61:751-756.

Dechartres A, Charles P, Hopewell S, Ravaud P, Altman DG (2011). Reviews assessing the quality or the reporting of randomized controlled trials are increasing over time but raised questions about how quality is assessed. J Clin Epidemiol 64:136-144.

DerSimonian R, Charette LJ, McPeek B, Mosteller F (1982). Reporting on methods in clinical trials. N Engl J Med 306:1332-1337.

Dickersin K, Chalmers F (2014). Thomas C Chalmers (1917-1995): a pioneer of randomized clinical trials and systematic reviews. JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/thomas-c-chalmers-1917-1995/)

Dunn HL (1929). Application of statistical methods in physiology. Physiol Rev 9:275-398.

Dunn HL (1959). What high-level wellness means. Can J Public Health 50:447-57.

Feinstein AR (1974). Clinical biostatistics. XXV. A survey of the statistical procedures in general medical journals. Clin Pharmacol Ther 15:97-107.

Feynman R (1974). Cargo cult science. Engineering Sci 37:10-13.

Glasziou P, Meats E, Heneghan C, Shepperd S (2008). What is missing from descriptions of treatment in trials and reviews? BMJ 336:1472-1474.

Glick BS (1963). Inadequacies in the reporting of clinical drug research. Psychiatr Q 37:234-244.

Gøtzsche P (1989). Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal anti-inflammatory drugs in rheumatoid arthritis. Control Clin Trials 10:31-56.

Grant A (1989). Reporting controlled trials. Br J Obstet Gynaecol 96:397-400.

Greenwood M (1932). What is wrong with the medical curriculum. Lancet i:1269-1270.

Hartley J (2014). Current findings from research on structured abstracts: an update. J Med Libr Assoc 102:146-148.

Haynes RB, Mulrow CD, Huth EJ, Altman DG, Gardner MJ (1990). More informative abstracts revisited. Ann Intern Med 113:69-76.

Hogben L (1950). Chance and choice by cardpack and chessboard. Vol. 1. New York: Chanticleer Press, unnumbered page.

Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, Schulz KF (2008a). CONSORT for reporting randomised trials in journal and conference abstracts. Lancet 371: 281-283.

Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, Schulz KF (2008b). CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLoS Med 5:e20.

Hopewell S, Collins GS, Boutron I, Yu LM, Cook J, Shanyinde M, Wharton R, Shamseer L, Altman DG (2014). Impact of peer review on reports of randomised trials published in open peer review jour-nals: retrospective before and after study. BMJ 349:g4145.

Hopewell S, Hirst A, Collins GS, Mallett S, Yu LM, Altman DG (2011). Reporting of participant flow diagrams in published reports of randomized trials. Trials 12:253.

Ioannidis JPA, Evans SJW, Gøtzsche PC, O’Neill RT, Altman DG, Schulz KF, Moher D, the CONSORT Group (2004). Improving the reporting of harms in randomized trials: Expansion of the CONSORT statement. Ann Intern Med 141:781-788.

Jedlitschka A, Pfahl D (2005). Reporting guidelines for controlled experiments in software engineering. 2005 International Symposium on Empirical Software Engineering (ISESE), Proceedings:92-101.

Jüni P, Altman DG, Egger M (2001). Systematic reviews in health care: Assessing the quality of controlled clinical trials. BMJ 323:42-46.

Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG (2010). Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol 8:e1000412.

Korevaar DA, Wang J, van Enst WA, Leeflang MM, Hooft L, Smidt N, Bossuyt PM (2015). Reporting diagnostic accuracy studies: some improvements after 10 years of STARD. Radiology 274:781-789.

Liberati A, Himel HN, Chalmers TC (1986). A quality assessment of randomised controlled trials of primary treatment of breast cancer. J Clin Oncol 4:942-51.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 339:b2700.

Mahady SE, Schlub T, Bero L, Moher D, Tovey D, George J, Craig JC (2015). Side effects are incompletely reported among systematic reviews in gastroenterology. J Clin Epidemiol 68:144-153.

Mainland D (1934). Chance and the blood count. Can Med Assoc J 30:656-658.

Mainland D (1938). The treatment of clinical and laboratory data. Edinburgh: Oliver & Boyd.

Mainland D (1952). Elementary medical statistics. The principles of quantitative medicine. Philadelphia: WB Saunders.

Mainland D (1969). Some research terms for beginners: definitions, comments, and examples. I. Clin Pharmacol Ther 10:714-736.

Meinert CL, Tonascia S, Higgins K (1984). Content of reports on clinical trials: a critical review. Control Clin Trials 5:328-347.

Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF (1999). Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 354:1896-1900.

Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG (2010a). CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 340:c869.

Moher D, Schulz KF, Altman DG (2001). The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 357:1191-1194.

Moher D, Schulz KF, Simera I, Altman DG (2010b). Guidance for developers of health research reporting guidelines. PLoS Med 7:e1000217.

Moher D, Weeks L, Ocampo M, Seely D, Sampson M, Altman DG, Schulz KF, Miller D, Simera I, Grimshaw J, Hoey J (2011). Describing reporting guidelines for health research: a systematic review. J Clin Epidemiol 64:718-742.

Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS (2015). Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 162:W1-73.

Nieuwenhuis S, Forstmann BU, Wagenmakers EJ (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nat Neurosci 14:1105-1107.

O’Fallon JR (1990). Discussion of: Ellenberg JH. Biostatistical collaboration in medical research. Biometrics 45:24-26.

O’Fallon JR, Duby SD, Salsburg DS, et al (1978). Should there be statistical guidelines for medical research papers? Biometrics 34:687-695.

Parsons NR, Price CL, Hiskens R, Achten J, Costa ML (2012). An evaluation of the quality of statistical design and analysis of published medical research: results from a systematic survey of general orthopaedic journals. BMC Med Res Methodol 12:60.

Prescott RJ, Civil I (2013). Lies, damn lies and statistics: errors and omission in papers submitted to INJURY 2010-2012. Injury 44:6-11.

Pearl R (1919). A statistical discussion of the relative efficacy of different methods of treating pneumonia. Arch Intern Med 24:398-403.

Piaggio G, Elbourne DR, Pocock SJ, Evans SJ, Altman DG, the CONSORT Group (2012). Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA 308:2594-2604.

Plint AC, Moher D, Morrison A, et al (2006). Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review. Med J Aust 185:263-267.

Pocock SJ, Hughes MD, Lee RJ (1987). Statistical problems in the reporting of clinical trials. NEJM 317:426-32.

Pocock SJ, Collier TJ, Dandreo KJ, de Stavola BL, Goldman MB, Kalish LA, et al (2004). Issues in the reporting of epidemiological studies: a survey of recent practice. BMJ 329:883.

Rennie D (1995). Reporting randomized controlled trials. An experiment and a call for responses from readers. JAMA 273:1054-1055.

Ross OB, Jr (1951). Use of controls in medical research. JAMA 145:72-75.

Sandifer MG, Dunham RM, Howard K (1961). The reporting and design of research on psychiatric drug treatment: a comparison of two years. Psychopharmacol Serv Cent Bull 1:6-10.

Sargeant JM, O’Connor AM, Gardner IA, Dickson JS, Torrence ME (2010). The REFLECT statement: reporting guidelines for Randomized Controlled Trials in livestock and food safety: explanation and elaboration. Zoonoses Public Health 57:105-136.

Schor S, Karten I (1966). Statistical evaluation of medical journal manuscripts. JAMA 195:1123-1128.

Schulz KF, Altman DG, Moher D (2010). CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ 340:c332.

Simera I, Altman DG (2009). Writing a research article that is “fit for purpose”: EQUATOR Network and reporting guidelines. Evidence-Based Med 14:132-134.

Simera I, Altman DG, Moher D, Schulz KF, Hoey J (2008). Guidelines for reporting health research: The EQUATOR Network’s survey of guideline authors. PLoS Med 5:e139.

Sollmann T (1917). The crucial test of therapeutic evidence. JAMA 69:198-199.

Squires BP, Elmslie TJ (1990). Reports of randomized controlled trials: what editors want from authors and peer reviewers. CMAJ 143:381-382.

Standards of Reporting Trials Group (1994). A proposal for structured reporting of randomized controlled trials. JAMA 272:1926-1931.

Stevens A, Shamseer L, Weinstein E, Yazdi F, Turner L, Thielman J, Altman DG, Hirst A, Hoey J, Palepu A, Schulz KF, Moher D (2014). Relation of completeness of reporting of health research to journals’ endorsement of reporting guidelines: systematic review. BMJ 348:g3804.

Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB (2000). Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA 283:2008-2012.

Turner L, Shamseer L, Altman DG, Schulz KF, Moher D (2012). Does use of the CONSORT Statement impact the completeness of reporting of randomised controlled trials published in medical journals? A Cochrane review. Syst Rev 1:60.

Tyson JE, Furzan JA, Reisch JS, Mize SG (1983). An evaluation of the quality of therapeutic studies in perinatal medicine. J Pediatr 102:10-13.

Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, Poole C, Schlesselman JJ, Egger M; STROBE Initiative (2007). Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. PLoS Med 4:e297.

Vohra S, Shamseer L, Sampson M, Bukutu C, Schmid CH, Tate R, Nikles J, Zucker DR, Kravitz R, Guyatt G, Altman DG, Moher D (2015). CONSORT extension for reporting N-of-1 trials (CENT) 2015 Statement. BMJ 350:h1738.

Waife SO (1959). Problems of publication. In: Waife SO, Shapiro AP, eds. The Clinical Evaluation of New Drugs. New York: Hoeber-Harper, pp 213-216.

Yelle L, Bergsagel D, Basco V, Brown T, Bush R, Gillies J, Israels L, Miller A, Rideout D, Whitelaw D, et al. (1991). Combined modality therapy of Hodgkin’s disease: 10-year results of National Cancer Institute of Canada Clinical Trials Group multicenter clinical trial. J Clin Oncol 9:1983-1993.

Zelen M (1989). The reporting of clinical trials: counting is not easy. J Clin Oncol 7:827-828.