What should science funding agencies do about this?
The easiest answer is that they should all regularly fund a “reproducibility project” on their own portfolio. Such a project should randomly sample from experiments/studies, or should deliberately select projects that are viewed as highly influential. Either way, this would effectively serve as an audit on the agency and on the science that it funds.
Replication isn’t everything, of course. It’s quite possible for useless studies to replicate perfectly. But if a field is systematically dominated by exaggerated effects or false positives, it isn’t likely to be very innovative either, because it is resting on a foundation of sand.
Better yet, federal funding agencies could set up a permanent organization for research parasites or data thugs, whose job would be detecting fraud, replication problems, various forms of malpractice, etc., as well as developing better algorithms to do so automatically.
We know of quite a few alarming cases of major scientific fraud—Marc Hauser, Diederik Stapel, Michael LaCour, Erin Potts-Kant, Anil Potti, Haruko Obokata, and most recently, Dan Ariely. But this is likely just the tip of the iceberg.
For one, cases of fraud turn out to be blazingly obvious once anyone looks at the data and/or statistical modeling. We’re probably not catching fraudsters who are even slightly clever about it.
For another thing, very few people who want to succeed in academia will call out fraud and malpractice—it takes time and effort that could be devoted to their own research, and it alienates prominent researchers and journal editors in their own field.
Fraud detection is thus vastly undersupplied, and various offices of “Research Integrity” usually have the same motive as do “Human Resources” offices when a corporate executive faces a harassment lawsuit (that is: defend the institution, deny the allegations, and try to make the whole thing go away).
As far as research integrity is concerned, it is as if we lived in a world absent of official policemen or prosecutors, and instead had to rely on a handful of obsessive volunteers. There would be a lot more crime in that world.
Consider the base rate here: If the top 5% of scientists produce major advances, and the bottom 5% of scientists are dishonest but desperate to succeed somehow, then if they all publish seemingly major advances, fully half of those findings will be fraudulent!
This is just a stylized example, but we don’t know the true extent of fraud and malpractice—just the ones that are both dumb enough to be obvious about it, and unlucky enough to draw attention from the rare obsessive critic.
If we lavish praise and funding on scientists who produce major advances, but never bother to check the results’ validity, we might as well put up a sign: “Go ahead and defraud us.”
Literally one hundredth of one percent of the NIH’s yearly funding would be enough to maintain a small organization devoted to checking data for signs of manipulation, fraud, or just sloppiness. That would be an influential corrective to scientific practice. Of course, it would be far more influential if we could get scientists to stop putting so much of a premium on exciting results—but that seems to be an inescapable part of human nature, and thus it is important to check up on would-be exciting results at least once in a while.
 Consider Chase Bank’s fraud detection—I can go on a road trip with no problem, but if I happen to fill up my car with gas in an area of town that I don’t normally visit, I immediately get a text wondering whether I actually made the purchase. Clearly, there is some behavioral and geographic algorithm that “knows” where I normally go, as well as where I might go on a road trip, as well as the behavior of potential thieves.
Imagine a similarly sophisticated algorithm used for scientific research so as to be triggered by the most common signs of potential fraud or inaccuracy. E.g., one second after an article submission, an email is generated saying, “Your article has discrepancies in the p-values and sample size,” or, “the probability distribution of your data doesn’t seem plausible,” or, “you report means that are impossible given the sample size and the ordinal scale being used.”