Gøtzsche PC (2021). Citation bias: questionable research practice or scientific misconduct?

© Peter C Gøtzsche, Institute for Scientific Freedom, Copenhagen, Denmark. email pcg@scientificfreedom.dk.

Cite as: Gøtzsche PC (2021). Citation bias: questionable research practice or scientific misconduct? JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/citation-bias-questionable-research-practice-or-scientific-misconduct/)

Citation bias

Citation bias occurs when authors preferentially cite research that supports their own findings or claims, or research that showed what they had hoped to find but didn’t find in their research. In research articles, citation bias may occur in the Introduction section, where the researchers argue why their own research is important, and in the Discussion section, where they put their findings into context and perspective.

The first use of the term ‘citation bias’ of which I am aware was not in biomedicine. In 1985, researchers in physics referred to “a citation bias against Eastern-bloc [particle] accelerators” (Irvine and Martin 1985).

Demonstration of citation bias in a systematic review of trials of NSAIDS for rheumatoid arthritis

In 1987, I reported, in an article published in the BMJ, what seems likely to be the first demonstration of citation bias in a systematic review of trials in healthcare (Gøtzsche 1987). For my doctoral thesis, I used exhaustive search strategies, read the references of the reports identified, and wrote to manufacturers in an attempt to assemble all published and unpublished reports of double-blind trials that had compared two or more non-steroidal anti-inflammatory drugs (NSAIDs) in patients with rheumatoid arthritis.

I found a wide variety of biases favouring sponsors’ drugs and disfavouring comparator drugs. When bias in the Conclusions or Abstracts consistently favoured one of the drugs, it favoured the control drug in only one report and the new drug in the remaining 81 reports (P = 3.4 x 10-23) (Gøtzsche 1989). This observation was why I became interested in assessing whether trial reports were also biased when they cited earlier, similar trials.

I examined the reference lists for studies that had compared the same two drugs as those in the index trial report. For each article, I noted whether the proportion of references with a positive outcome for the new drug was the same, lower, or higher than the proportion among all articles assumed to have been available to the authors (those published more than two years earlier than the index article). Ten articles had a neutral selection of references, 22 a negative selection, and 44 a positive selection (P < 0.01; sign test). The bias was not caused by overrepresentation of highly cited journals among the articles with positive selection of references, or by better methodological quality of the cited articles. And the trials that were least cited were not published in journals or books that are difficult to identify in a search, or to obtain through a library.

I concluded my report as follows:

“The reference bias shown in this study seems to be real. Such a finding has important implications, since there is no reason to believe that rheumatologists are more biased than others in selecting references. A reader tracing the literature on any new drug using the reference lists given in the articles might risk obtaining a biased sample. Reference bias has another serious implication: it may render the conclusion of the individual article less reliable. Is this also true for review articles, and for other disciplines in medicine?” (Gøtzsche 1987).

I called the bias ‘reference bias,’ but ‘citation bias’ is a better term. When I did a PubMed search on these terms in the title of articles (in quotation marks; 27 Aug 2021), I retrieved 16 articles using ‘reference bias’ and 21 articles using ‘citation bias.’ The only relevant records for ‘reference bias’ were two of my own papers (Gøtzsche 1987 and Schmidt and Gøtzsche 2005), a letter to the editor about the first one, and my translation of the first one into Danish. The other 12 articles with ‘reference bias’ in the title were about problems related to the use of a ‘reference genome’ or ‘reference tomography,’ or to the use of self-reports of health or quality of life, for example when patients used themselves as the reference. In contrast, all 21 records retrieved using ‘citation bias’ as the search term were relevant, although three were comments on other papers, one was about handsearching literature, one was about gender bias, and one was authored by Italian researchers claiming that they were being cited less frequently than they should have been.

Cholesterol and coronary heart disease

In 1992, Uffe Ravnskov published an analysis based on 14 cholesterol lowering trials regarded by the trial directors as supportive of beneficial effects, and 10 were considered unsupportive (Ravnskov 1992). Ravnskov found that the trials deemed to be supportive were cited almost six times more often than other trials, and that unsupportive trials were not cited at all after 1970, even though they were similar in number to those considered supportive.

Criticism of the role of a low-fat diet for preventing heart disease is often met with the assertion that consensus committees have settled the issue unanimously. Using three authoritative reviews, Ravnskov studied the work of such committees (Ravnskov 1995). As he found that fundamental parts of the hypothesis seemed to be based on biased quotation, he used the term ‘quotation bias’ in the title of his article.

Overcitation of unsupportive studies

Rarely, citation bias goes in the opposite direction. A 1995 study found that five reviews cited more unsupportive than supportive trials of the effectiveness of pneumococcal vaccines; two cited more supportive trials, and one cited an equal proportion of supportive and unsupportive trials (Hutchinson et al. 1995). Overall, unsupportive trials were twice as likely to be cited as supportive trials, but there was also a time issue. Results of all seven trials in adults published before 1980 were supportive, whereas six of the seven trials published in 1980 or later were unsupportive. The citation pattern might therefore reflect a tendency of authors to preferentially cite recent studies.

A review of studies designed to detect citation bias

During the next 20 years, an average of about two studies designed to detect citation bias were published annually. In 2017, a Dutch research group (Duyx et al. 2017) published a systematic review of 52 studies, from across scientific disciplines, designed to detect citation bias. The authors claimed to have done “the first systematic review of citation bias” and that they had taken account of “all available evidence”. Because they had not included my 1987 report (Gøtzsche 1987), I downloaded the protocol for their review (Duyx et al. 2015) to explore how this could have happened. One of their search terms was ‘referenc* bias*.’ When I used that term to search PubMed (17 Aug 2021), I retrieved my 1987 paper and another systematic review from my research group which the Dutch researchers had also failed to include (Schmidt and Gøtzsche 2005), even though both papers had ‘reference bias’ in their titles. Although the Dutch researchers claimed to have checked the reference lists of the reports they had included, three of the 36 biomedical studies of citation bias which they had included (their references 18, 22 and 29) had quoted my 1987 report.

Duyx et al. found that positive articles were cited “about 1.3 to 3.7 times more often” than negative articles and that statistically significant articles were cited 1.6 (95% confidence interval 1.3 to 1.8) times as often as statistically nonsignificant articles. When they looked at the direction of the results and whether they supported the investigators’ hypotheses, they found pooled ratios of 2.1 (1.3 to 3.6) and 1.8 (1.4 to 2.4), respectively. Articles with a positive conclusion that supported the investigators’ conclusion were cited 2.7 (2.0 to 3.7) times more often than others. Research quality was not related to the number of citations, whereas journal impact factor was.

One of the reports included by the Dutch researchers had assembled 458 eligible articles (Jannot et al. 2013). It also found that statistically significant studies were cited twice as often as statistically nonsignificant studies, but that the association disappeared after adjustment for journal impact factor. Jannot et al. were aware that the journal impact factor may be considered an intermediary causal variable between statistical significance and citation frequency and that it is therefore wrong to adjust for journal impact factor. If the association is strong, an adjustment for an intermediary causal factor along the causal pathway could remove completely any true association between the primary causal factor and the outcome. The authors of another report included in the Dutch review had not understood this. They declared that “The prestige and visibility of journals is a potential confounding factor, which should be adjusted for,” which they then did (Nieminen et al. 2007).

The Dutch team went on to conduct a citation network analysis based on 108 articles (Urlings et al. 2019). They judged that disproportionate attention had been paid to articles suggesting a harmful effect of trans fat on cholesterol. Reporting statistically significant results was a strong predictor of citation, together with sample size, journal impact factor, and the ‘authority’ of the authors.

Most recently, the Dutch team assessed citation bias in six research areas (Urlings et al. 2021). They concluded that “The probability of being cited seems associated with positive study outcomes, the authority of its authors, and the journal in which that article is published.” They illustrated themselves that ‘authority’ plays a role, as 12 of their 39 references were to papers co-authored by the group’s senior researcher, Lex Bouter. Self-citation cannot be avoided if one has done most of the relevant research in an area, but this is not the case here. Several of their self-citations were of questionable relevance to the points being made.

Case study: “Of mites and men”

In 1998, my research group published a systematic review of chemical and physical interventions intended to reduce antigens against house dust mites (Gøtzsche et al. 1998). Our findings, that the interventions did not have an effect on patients with asthma, were very robust but unwelcome everywhere, including within the Cochrane Collaboration (Gøtzsche 2019).

The citation bias in this field is extreme. We documented this in a 2005 systematic review of narrative review articles in which opinions had been expressed about the clinical effects of physical or chemical interventions (Schmidt and Gøtzsche 2005). We titled our article “Of Mites and Men,” inspired by John Steinbeck’s novel “Of Mice and Men.”

Narrative review articles on this topic usually asserted that several methods were effective. We judged positive bias to have been present if the reference list contained a higher proportion of references to trials with statistically significant results favouring the interventions than the proportion among all trials judged to have been available to the authors (published 2 years or more prior to the review). Of the 38 narrative reviews that recommended physical interventions, 10 had neutral selection of trial references, one negative selection and 27 positive selection (P = 2 x 10-8, sign test).

The most quoted trial had only seven patients per group. The statistically significant benefit claimed for this trial was erroneous and was not based on a clinical outcome. The recommendations were often based on non-randomised studies, of which the most quoted study had only 10 patients per group, yet still claimed very positive results. In contrast, the most recent version of our review has 55 randomised trials and a total of 3121 patients (Gøtzsche and Johansen 2008).

Questionable research practice or scientific misconduct?

Citation bias is a questionable research practice, and it is sometimes so gross that it amounts to scientific misconduct. According to the BMJ, scientific misconduct includes deceptive selective reporting of findings and omission of conflicting data; wilful suppression or distortion of data; serious deviation from accepted practices in proposing or carrying out research; and improper reporting of results https://www.bmj.com/about-bmj/resources-authors/forms-policies-and-checklists/scientific-misconduct (accessed 18 Aug 2021). The BMJ also considers failures of transparency to be forms of scientific misconduct. Reporting is deceptive when citations do not support what is claimed or when the available evidence contradicts what is claimed, but these references are not quoted, or they are misrepresented.

The studies that have been performed since my 1987 BMJ article show that citation bias is common. This was confirmed in a 2016 survey conducted by Lex Bouter. People attending international research integrity conferences ranked selective citation to enhance one’s own findings or convictions, or to please editors, reviewers, or colleagues, as the most frequently occurring form of research misbehaviour (Bouter et al. 2016).

A review of 1523 trial reports published from 1963 to 2004 found that fewer than a quarter of preceding trials had been cited, comprising fewer than a quarter of the participants enrolled in all relevant prior trials (Robinson and Goodman 2011). Potential implications of this include ethically unjustifiable trials, wasted resources, incorrect conclusions, and unnecessary risks for trial participants.

A cumulative meta-analysis of trials of intravenous streptokinase for myocardial infarction showed that a consistent, statistically significant reduction in total mortality was achieved in 1973 after only eight trials involving 2432 patients had been reported (Lau et al. 1992). The results of the 25 subsequent trials, which enrolled an additional 34,542 patients through 1988, had little or no effect on the odds ratio, but exposed the enrolled patients to an increased risk of death.

Another cumulative meta-analysis indicated that aprotinin greatly decreased the need for perioperative blood transfusion, stabilizing at an odds ratio of 0.25 by the 12th study (Fergusson et al. 2005). Citation of previous trials was extremely low, with a median of 20% of prior trials cited. Only 7 of 44 (15%) subsequent reports referenced the largest trial (N = 1784), which was 28 times larger than the median trial size.

As these examples illustrate, it is very important to discuss other trials comprehensively and in an unbiased way. The CONSORT guideline states that this can best be achieved by including a formal systematic review in the Results or Discussion sections of the report (Moher et al. 2010). When it is impractical, it is often possible to quote a systematic review of similar trials. At a minimum, the discussion should be as systematic as possible and be based on a comprehensive search, rather than being limited to studies with results that are concordant with those of the current trial.

The Introduction in the report of a clinical trial and in protocols for trials is also important and should also be systematic, preferably including reference to systematic reviews of relevant existing evidence (Clarke and Hopewell 2013). An early example of this was the 1986 report of the first ISIS trial, which assessed the effects of beta-blockade in myocardial infarction (ISIS-1 1986). In the Discussion section, the result of the trial was combined with a recent review of similar drugs. A decade later, we mentioned in the Introduction of our 1995 report of a trial of somatostatin for bleeding oesophageal varices the two trials that had been carried out previously and provided a meta-analysis of the data from the three trials in the Results section (Gøtzsche 1995).


Some of the illustrative instances I have described above amount to scientific misconduct, as it seriously distorts readers’ perceptions of what the best available evidence tells them, or constitutes misleading justification for conducting additional placebo-controlled trials. More attention to citation bias is needed to reduce potentially harmful consequences for patients, both directly, by distortion of evidence, and indirectly, by eroding trust in science.

Conflicts of Interest

Institute for Scientific Freedom.

I thank Iain Chalmers for encouraging me to write this article and for his comments on earlier drafts; and Dr Hamish Chalmers, for identifying a 1985 use of the term “citation bias.”

This James Lind Library article has been republished in the Journal of the Royal Society of Medicine 2022;115:31-35. Print PDF


Bouter LM, Tijdink J, Axelsen N, Martinson BC, Ter Riet G (2016). Ranking major and minor research misbehaviors: results from a survey among participants of four World Conferences on Research Integrity. Res Integr Peer Rev 1:17.

Clarke M, Hopewell S (2013). Many reports of randomised trials still don’t begin or end with a systematic review of the relevant evidence. Journal of the Bahrain Medical Society 24:145-8.

Duyx B, Urlings MJE, Swaen G, Bouter LM, Zeegers MP (2015). Protocol for systematic review of citation bias. http://hdl.handle.net/10411/20710 (accessed 17 Aug 2021).

Duyx B, Urlings MJE, Swaen GMH, Bouter LM, Zeegers MP (2017). Scientific citations favor positive results: a systematic review and meta-analysis. J Clin Epidemiol 88:92-101.

Fergusson D, Glass KC, Hutton B, Shapiro S (2005). Randomized controlled trials of aprotinin in cardiac surgery: could clinical equipoise have stopped the bleeding? Clin Trials 2:218-29.

Gøtzsche PC (1987). Reference bias in reports of drug trials. Br Med J 295:654-6.

Gøtzsche PC (1989). Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal, antiinflammatory drugs in rheumatoid arthritis. Controlled Clin Trials 1989;10:31-56 (erratum:356).

Gøtzsche PC (2019). Survival in an overmedicated world: look up the evidence yourself. Copenhagen: People’s Press.

Gøtzsche PC, Gjørup I, Bonnén H, Brahe NEB, Becker U, Burcharth F (1995). Somatostatin v placebo in bleeding oesophageal varices: randomised trial and meta-analysis. BMJ 310:1495-8.

Gøtzsche PC, Hammarquist C, Burr M (1998). House dust mite control measures in the management of asthma: meta-analysis. BMJ 317:1105-10.

Gøtzsche PC, Johansen HK (2008). House dust mite control measures for asthma: systematic review. Allergy 63:646-59.

Hutchison BG, Oxman AD, Lloyd S (1995). Comprehensiveness and bias in reporting clinical trials. Study of reviews of pneumococcal vaccine effectiveness. Can Fam Physician 41:1356-60.

Irvine J, Martin BR (1985). Basic Research in the East and West: A Comparison of the Scientific Performance of High Energy Physics Accelerators. Social Studies of Science, Vol. 15:293-341. In: Science Policy Study – Hearings, Volume 13. British Science Evaluation Methods: Hearing Before the Task Force on Science Policy. Committee on Science and Technology No. 59, US Government Printing Office, Washington. pp 292.

ISIS-1 (1986). Randomised trial of intravenous atenolol among 16 027 cases of suspected acute myocardial infarction: ISIS-1. First International Study of Infarct Survival Collaborative Group.
Lancet 2:57-66.

Jannot AS, Agoritsas T, Gayet-Ageron A, Perneger TV (2013). Citation bias favoring statistically significant studies was present in medical research. J Clin Epidemiol 66:296-301.

Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC (1992). Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med 327:248-54.

Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG (2010). CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 340:c869.

Nieminen P, Rucker G, Miettunen J, Carpenter J, Schumacher M (2007). Statistically significant papers in psychiatry were cited more often than others. J Clin Epidemiol 60:939-46.

Ravnskov U (1992). Cholesterol lowering trials in coronary heart disease: frequency of citation and outcome. BMJ 305:15-9.

Ravnskov U (1995). Quotation bias in reviews of the diet-heart idea. J Clin Epidemiol 48:713-9.

Robinson KA, Goodman SN (2011). A systematic examination of the citation of prior research in reports of randomized, controlled trials. Ann Intern Med 154:50-5.

Schmidt LM, Gøtzsche PC (2005). Of mites and men: reference bias in narrative review articles: a systematic review. J Fam Pract 54:334-8.

Urlings MJE, Duyx B, Swaen GMH, Bouter LM, Zeegers MP (2021). Citation bias and other determinants of citation in biomedical research: findings from six citation networks. J Clin Epidemiol 132:71-8.

Urlings MJE, Duyx B, Swaen GMH, Bouter LM, Zeegers MPA (2019). Citation bias in the literature on dietary trans fatty acids and serum cholesterol. J Clin Epidemiol 106:88-97.