Wednesday, October 23, 2013

Replicating Research

Two recent pieces discuss endemic problems in research, one from a general perspective (from which I will quote), and one from a specific case (warning: unnecessary crude language). The problem is replication of results.

In scholarship findings should, in theory, be able to be checked. In epigraphic disciplines, one should be able to collate the texts. In historical disciplines, one should be able check the footnotes. In experimental disciplines, one should be able to reproduce the experiment. For years I checked Hugh Nibley's footnotes. I developed a sense for how accurate he was based on how much time I spent finding things. Then I decided to actually count the percentages. My guesses were wrong. I spent much more time looking for problem footnotes and so overestimated the number of them that there were.
Analysis of a random chapter showed that of its almost seven hundred citations, Nibley was completely accurate 94 percent of the time, and in more than half of these remaining forty cases, one could explain the problem as a typographical error. (CWHN 16:xx.)
Checking footnotes is very time consuming and can be quite expensive. It is thus rarely done. I concluded that for Nibley it was probably an unnecessary expense. My recommendations were ignored, in part because many individuals had unfairly accused Nibley of faking his footnotes. (I remember checking the footnotes of someone who accused Nibley of faking his footnotes and found that a third of the footnotes of the accuser were wrong.)

But the sciences ought to do better at this, right? Well, not necessarily:
A few years ago scientists at Amgen, an American drug company, tried to replicate 53 studies that they considered landmarks in the basic science of cancer, often co-operating closely with the original researchers to ensure that their experimental technique matched the one used first time round. According to a piece they wrote last year in Nature, a leading scientific journal, they were able to reproduce the original results in just six. Months earlier Florian Prinz and his colleagues at Bayer HealthCare, a German pharmaceutical giant, reported in Nature Reviews Drug Discovery, a sister journal, that they had successfully reproduced the published results in just a quarter of 67 seminal studies.
So only about one in nine to one in four key drug experiments were replicated. Nibley was a lot better than that.

But peer review is supposed to catch errors. Unfortunately, it often does not.
The idea that there are a lot of uncorrected flaws in published studies may seem hard to square with the fact that almost all of them will have been through peer-review. This sort of scrutiny by disinterested experts—acting out of a sense of professional obligation, rather than for pay—is often said to make the scientific literature particularly reliable. In practice it is poor at detecting many types of error.

John Bohannon, a biologist at Harvard, recently submitted a pseudonymous paper on the effects of a chemical derived from lichen on cancer cells to 304 journals describing themselves as using peer review. An unusual move; but it was an unusual paper, concocted wholesale and stuffed with clangers in study design, analysis and interpretation of results. Receiving this dog’s dinner from a fictitious researcher at a made up university, 157 of the journals accepted it for publication.

Dr Bohannon’s sting was directed at the lower tier of academic journals. But in a classic 1998 study Fiona Godlee, editor of the prestigious British Medical Journal, sent an article containing eight deliberate mistakes in study design, analysis and interpretation to more than 200 of the BMJ’s regular reviewers. Not one picked out all the mistakes. On average, they reported fewer than two; some did not spot any.
And there are other problems with peer review:
As well as not spotting things they ought to spot, there is a lot that peer reviewers do not even try to check. They do not typically re-analyse the data presented from scratch, contenting themselves with a sense that the authors’ analysis is properly conceived. And they cannot be expected to spot deliberate falsifications if they are carried out with a modicum of subtlety.

Still, these things get corrected eventually. Perhaps not:
Academic scientists readily acknowledge that they often get things wrong. But they also hold fast to the idea that these errors get corrected over time as other scientists try to take the work further. Evidence that many more dodgy results are published than are subsequently corrected or withdrawn calls that much-vaunted capacity for self-correction into question. There are errors in a lot more of the scientific papers being published, written about and acted on than anyone would normally suppose, or like to think.
The trouble is that few scholars check their colleagues work, especially when there is pressure to continue publishing one's own work. Furthermore, pointing out your colleagues' errors is a good way to make enemies (academics often have extremely thin skins), so most people will not publicly point out when their colleagues write garbage even if they recognize it to be such. Besides, one would like to think that one can trust one's colleagues. In such cases (which may be more prevalent than one would like to think) collegiality is the enemy of academic progress.