Toggle Menu
March 21, 2022

Reforming Peer Review at NIH

By Stuart Buck

These comments are in response to the NIH Center for Scientific Review’s draft strategic plan for 2022-2027.

Introduction

As Francis Collins wrote just after being appointed NIH Director, “Although the two-level NIH peer-review process is much admired and much copied around the world, its potential tendency toward conservatism is a chronic concern and invariably worsens when funding is very tight.”[1]

Numerous commentators have echoed his concern. To take just a few examples:

  • Former NCI Director Richard Klausner: “There is no conversation that I have ever had about the grant system that doesn’t have an incredible sense of consensus that it is not working. That is a terrible wasted opportunity for the scientists, patients, the nation and the world.”[2]
  • Nobel Laureate Roger Kornberg: “In the present climate especially, the funding decisions are ultraconservative. If the work that you propose to do isn’t virtually certain of success, then it won’t be funded. And of course, the kind of work that we would most like to see take place, which is groundbreaking and innovative, lies at the other extreme.”[3]
  • Bruce Alberts and Venkatesh Narayanamurti: “How successful would Silicon Valley be if nearly 99% of all investments were awarded to scientists and engineers age 36 years or older, along with a strong bias toward funding only safe, non-risky projects?”[4]

In light of these concerns—which are supported by exhaustive reviews of the academic literature—we have two recommendations:

1) CSR should take bold steps to experiment with the peer review process, with the goal of radically improving the rate at which innovative, high-reward science is funded.

2) CSR should ensure that independent scholars are given access to data on the full set of applicants, proposals, and peer review comments/scores (including the 80+% of non-funded proposals), so that they can evaluate what works and what doesn’t.

Background

Let’s start with the basic question of whether peer review finds the best or most innovative science to fund. Thus far, there is very little evidence on that question. A Cochrane review in 2007 found that “no studies assessing the impact of peer review on the quality of funded research are presently available. Experimental studies assessing the effects of grant giving peer review on importance, relevance, usefulness, soundness of methods, soundness of ethics, completeness and accuracy of funded research are urgently needed.”[5]

The situation hadn’t improved in 2017-18, when RAND reviewed the literature on behalf of an International Peer Review Expert Panel commissioned by the Canadian government. Their conclusion: “Our most startling finding was the dearth of evidence around the effectiveness of peer review, especially given its importance in the research system. Moreover, the evidence that does exist is not reassuring.”[6]

“Not reassuring” is putting it mildly. Consistent with what Collins, Alberts, Klausner, and many others have said, RAND found that “there is evidence that peer review is vulnerable to cronyism, and that it stifles innovation. Review scores are highly variable, suggesting a lack of reliability in the process and making it difficult to judge whether such scores reflect the chances of success.”

Academic evidence sheds light on why peer review would tend to inhibit innovation: Peer reviewers often view their role (particularly in a tight funding environment) as finding reasons to kill off proposals, while other people (not wanting to appear gullible or foolish) are particularly influenced by anyone who criticizes or makes negative judgments. The peer review process can thus be heavily biased towards rejecting novel ideas that often inherently attract at least one or two naysayers.

For example, Karim Lakhani and his colleagues at Harvard and elsewhere studied two separate calls for health grants, one on computational health and one on the microbiome.[7] They recruited 369 people from different medical schools to evaluate a total of 97 proposals. The reviewers gave the proposals a score similar to what is expected at the NIH. Then, they randomized the reviewers to get feedback in which they saw that other reviewers had given either lower or higher scores to the same proposal. The reviewers then had a chance to change their own scores.

Lo and behold: reviewers who saw low scores from other reviewers then went back and lowered their original scores by 0.759 points on average. When shown higher scores, they only raised their original scores by 0.449 points on average. (This is on a scale from 1 to 9). Put another way, “the resulting difference corresponds to a 23.5 percentage decrease between the updated scores that were treated with lower versus higher scores.”

In other words, when peer reviewers have the chance to talk amongst each other—as happens at NIH study sections—“evaluators are more likely to focus on proposal weaknesses than the strengths.” This could explain “what many see as ‘conservatism bias’ in funding novel projects, which has conjured slogans such as ‘conform and be funded’ and ‘bias against novelty.’”

One paper even shows that “a single negative peer-review” will “reduce the chances of a proposal being funded from around 55% to around 25% (even when it has otherwise been rated highly).” The study additionally found that “the peer-review scores assigned by different reviewers have only low levels of consistency (a correlation between reviewer scores of only 0.2).”[8]

In short, present peer review practices may directly inhibit innovation. Given the advent of a new five-year strategic plan, CSR has an opportunity to improve both the quality of peer review and the quality of the evidence base.  

Recommendations

Under federal law, the NIH is required to use “appropriate technical and scientific peer review of . . . applications made for biomedical and behavioral research.”[9]

That’s it. The law says nothing about how individual peer reviewers’ judgments will be scored, or how individual scores will be aggregated and weighed, or whether program officers have flexibility to override peer review in some respects. Indeed, the federal regulations governing NIH state that “recommendations by peer review groups are advisory only and not binding on the awarding official or the national advisory council or board.”[10]

In light of this legal flexibility, CSR has a wonderful opportunity to launch a number of bold experiments with the peer review process, thus shedding light on a number of heretofore-unexplored questions.

One set of research questions would revolve around predictive accuracy. As Fang and Casadevall wrote, “Almost no scientific investigation has been performed to examine the predictive accuracy of study section peer review. With more than a half-century of study section assessments on record, it would be interesting to know the frequency with which major scientific discoveries were recognized and anticipated by study sections. For example, what fraction of applications scored above or below the 10th percentile has been associated with major recognized scientific discoveries during the past 50 years? Similarly, what percentage of important scientific discoveries that were initially reviewed as proposals was rejected?”[11]

A second set of research questions would revolve around actually changing the peer review process itself. CSR has done this sort of thing before,[12] and could do so again—perhaps even with a randomized experiment on its own operations. For example:

  • Limited Lotteries.  Numerous scholars suggest using a limited lottery as a tie-breaker for highly qualified proposals that are basically impossible to tell apart.[13] The Swiss National Science Foundation and the Novo Nordisk Foundation (as of 2020, the largest private foundation in the world) are trying this out.[14] NIH could do the same, and better yet, could do so by randomizing which set of proposals are subject to the lottery in the first place, so that we would be able to see the difference between the two approaches (lottery or not).
  • Golden Tickets. CSR could give reviewers a “golden ticket” such that they can guarantee an application gets funded even if other reviewers disagree. There are at least two private foundations in Europe that are trying out this approach.[15]
  • Bimodal Scores. Highly novel ideas might have a few champions but some naysayers as well. When peer review scores are highly bimodal, this might be a key indicator of a high-risk but high-reward project. CSR could experiment with blinding study section members to everyone else’s comments, and then funding some projects that have both high and low ratings.
  • Program Officer Discretion. The NIH could experiment with giving program officers more discretion to bypass peer review ratings, and fund projects that they think are highly valuable. This would be a test both of peer review and of the existing program officers’ judgment.

Not only should CSR experiment with these and other ideas, it should make much more data available to independent scholars and evaluators. As RAND said, “Through our conversations with funders it appears that where analysis is carried, it is often not published, partly because of the extreme sensitivity around funding allocation procedures.”

While this is understandable, it is unacceptable in an age of increased transparency about government operations. As RAND said, “If we are to improve the allocation of funds for research, funders should strive to make such studies of and data about their processes available to support discussion and allow comparative analysis.”[16] If the IRS can work out the privacy protections such that independent scholars are able to study many hundreds of millions of individual tax returns,[17] the NIH could surely do so as to grant applications or peer review scores that are much less of a privacy issue.

A final note: In studying the effects of peer review or anything else, CSR should take care to include a broad range of metrics and outcomes to determine whether scientific projects led to valuable findings. In no circumstances should it focus on “citations over the first three years,” which could lead to perverse results. Studies have shown “strong evidence of delayed recognition of novel papers” which are “less likely to be top cited when using a short time window.”[18] Moreover, “highly novel papers also tend to be published in journals with lower impact factors. . . . the more we bind ourselves to quantitative short-term measures, the less likely we are to reward research with a high potential to shift the frontier — and those who do it.”[19]

Conclusion

With a new 5-year strategic plan under consideration, CSR should address problems that have been noted for many years: the overly-conservative nature of peer review, and the lack of data access to outside scholars. A set of bold research projects and experiments would be a world-wide model for research funders.

Signed,

Stuart Buck, Executive Director, Good Science Project

Rachael Neve, Co-director, Gene Delivery Technology Core, Massachusetts General Hospital

Brian Nosek, Executive Director, Center for Open Science; Professor, University of Virginia


[1] Francis S. Collins, “Opportunities for Research and NIH,” Science (Jan. 1, 2010) 327 no. 5961: 36-37, available at https://www.science.org/doi/full/10.1126/science.1185055.

[2] Quoted in Gina Kolata, “Grant System Leads Cancer Researchers to Play It Safe,” New York Times (June 27, 2009), available at https://www.nytimes.com/2009/06/28/health/research/28cancer.html.

[3] Quoted in Ferric C. Fang and Arturo Casadevall, “NIH Peer Review Reform—Change We Need, or Lipstick on a Pig?,” Infection and Immunity (Mar. 2009), available at https://journals.asm.org/doi/full/10.1128/IAI.01567-08.

[4] Bruce Alberts and Venkatesh Narayanamurti, “Two threats to U.S. science,” Science 364 no. 6441 (2019): 613, available at https://www.science.org/doi/10.1126/science.aax9846.

[5] See Vittorio Demicheli and Carlo Di Pietrantonj, “Peer review for improving the quality of grant applications,” Cochrane Library (April 2007), available at https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.MR000003.pub2/full.

[6] Steven Wooding and Susan Guthrie, “Why We Need to Experiment with Grant Peer Review” (2017), available at https://www.rand.org/blog/2017/04/why-we-need-to-experiment-with-grant-peer-review.html. See also Susan Guthrie, Ioana Ghiga, and Steven Wooding, “What do we know about grant peer review in the health sciences? An updated review of the literature and six case studies,” RAND Europe (2018), at p. xii.

[7] Jacqueline N. Lane et al., “Conservatism Gets Funded? A Field Experiment on the Role of Negative Information in Novel Project Evaluation,” Management Science (Oct. 28, 2021), working paper available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3656495.

[8] John Jerrim and Robert de Vries, “Are peer-reviews of grant proposals reliable? An analysis of Economic and Social Research Council (ESRC) funding applications,” Social Science Journal (Mar. 6, 2020), available at https://www.tandfonline.com/doi/abs/10.1080/03623319.2020.1728506?journalCode=ussj20. These findings were based on data on around 4,000 proposals and 15,000+ reviews at the UK’s Economic and Social Research Council.

[9] 42 U.S.C. § 289a(a)(1).

[10] 42 C.F.R § 52h.7.

[11] Ferric C. Fang and Arturo Casadevall, “NIH Peer Review Reform—Change We Need, or Lipstick on a Pig?,” Infection and Immunity (Mar. 2009), available at https://journals.asm.org/doi/full/10.1128/IAI.01567-08.

[12] CSR, “A Pilot Study of Half-Point Increments in Scoring” (2019), available at https://public.csr.nih.gov/AboutCSR/HalfPointPilotStudy.

[13] For example, see Ferric C. Fang and Arturo Casadevall, “Research Funding; the Case for a Modified Lottery,” mBio 7 no. 2 (2016), available at https://journals.asm.org/doi/10.1128/mBio.00422-16; Elise S. Brezis, “Focal randomization: an optimal mechanism for the evaluation of R&D projects,” Science and Public Policy34 no. 10 (2007): available at https://d1wqtxts1xzle7.cloudfront.net/32427204/Brezis-Peer-Review_and_Randomization-2007-with-cover-page-v2.pdf; Kevin Gross and Carl T. Bergstrom, “Contest models highlight inherent inefficiencies of scientific funding competitions,” PLoS Biology (2019), available at https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000065.

[14] See http://www.snf.ch/en/researchinFocus/newsroom/Pages/news-210331-drawing-lots-as-a-tie-breaker.aspx; https://www.the-scientist.com/news-opinion/q-a-a-randomized-approach-to-awarding-grants-69741.

[15] Thomas Sinkjaer, “Fund ideas, not pedigree, to find fresh insight,” Nature (Mar. 6, 2018), available at https://www.nature.com/articles/d41586-018-02743-2.

[16] Susan Guthrie, Ioana Ghiga, and Steven Wooding, “What do we know about grant peer review in the health sciences? An updated review of the literature and six case studies,” RAND Europe (2018), at p. xiii.

[17] See, e.g., Jeffrey Mervis, “How Two Economists Got Direct Access to IRS Tax Records,” Science (May 22, 2014), available at https://www.science.org/content/article/how-two-economists-got-direct-access-irs-tax-records.

[18] Jian Wang, Reinhilde Veugelers, and Paula Stephan, “Bias against novelty in science: A cautionary tale for users of bibliometric indicators,” Research Policy 46 no. 8 (2017): 1416-1436.

[19] Paula Stephan, Reinhilde Veugelers, and Jian Wang, “Reviewers are blinkered by bibliometrics,” Nature 433 (2017): 411-412, available at https://www.nature.com/articles/544411a.

Stuart Buck is the executive director of the Good Science Project and a senior advisor to the Social Science Research Council.