Peer Review: the Publication Game and “the Natural Selection of Bad Science”

Editor’s Note: This is part II; part I can be found here.

Professor Brian Wansink is head of the Food and Brand Lab at Cornell University. The lab has had problems, some described in an article called “Spoiled Science” in the Chronicle of Higher Education early in 2017:

Four papers on which he is a co-author were found to contain statistical discrepancies. Not one or two, but roughly 150. That revelation led to further scrutiny of Wansink’s work and to the discovery of other eyebrow-raising results, questionable research practices, and apparent recycling of data in at least a dozen other papers. All of which has put the usually ebullient researcher and his influential lab on the defensive.

More recently, Wansink’s lab published data purporting to come from 8- to 11-year-old children that were in fact obtained from 3- to 5-year-olds.

Compared to these gaffes the lab’s next problem looks like a very minor one:

Wansink and his fellow researchers had spent a month gathering information about the feelings and behavior of diners at an Italian buffet restaurant. Unfortunately their results didn’t support the original hypothesis. ‘This cost us a lot of time and our own money to collect,’ Wansink recalled telling the graduate student. ‘There’s got to be something here we can salvage.’ [my italics]

Four publications emerged from the “salvaged” buffet study. The topic is no doubt of interest to restaurateurs but unlikely to shed light on the nature of human feeding behavior. It’s entertaining. The study is correlational, not causal—no experiments were done. These are all characteristics typical of most of the “science” you will read about in the media: a distraction and a waste of resources, perhaps, but not too harmful.

The real problem, the probable source of all of Wansink’s other problems, is hinted at by the bit in italics. It’s pretty clear that Professor Wansink’s aim is not the advancement of understanding, but the production of publications. By this measure, his research group is exceedingly successful: 178 peer-reviewed journal articles, 10 books, and 44 book chapters in 2014 alone. Pretty good for 10 faculty, 11 postdocs, and eight graduate students.

The drive to publish is not restricted to Professor Wansink. It is universal in academic science, especially among young researchers seeking promotion and research grants. The concept of the LPU (“least publishable unit,” i.e., the least amount of data that will get you a publication so your total can be as large as possible. The analogy is to physical units such as BTU=British Thermal Unit.) has been a joke among researchers for many years. I described the new industry of “pop-up” journals that have arisen to meet this demand in Part I.

The positive feedbacks I described earlier—popularity allows a journal to be selective, which makes it more popular and more able to select and so on—has nevertheless produced a handful of elite journals. The two most popular general-science journals are Nature, published in the U.K., and the U.S.-based Science.

But the emphasis in academia on publishing is misplaced. The number of publications, even publications in elite journals, is not a reliable proxy for scientific productivity. Great scientists rarely have long publication lists, and a paper in an “elite” journal isn’t necessarily a great paper. I will give just two examples. W. D. “Bill” Hamilton (1936-2000) was probably the most important evolutionary biologist since Charles Darwin. He published his first paper in 1963 and by 1983 had published a total of 22, a rate of just over one paper a year. Several of these papers were groundbreaking, his discovery of the importance of what evolutionists call inclusive fitness being perhaps the most important. But the number of papers he published is modest—compare them with Professor Wansink’s prodigious output or Brian Nosek’s promotion package below. One paper a year would now be considered inadequate in most research institutions.

My second example is personal: my first publication, which was in Science. The basic idea was that pigeons (the standard subject for operant conditioning experiments) could follow the spacing of rewards: working hard for food when it came frequently, more slowly when it came less frequently. Here is what I found. Never mind the details, just notice that the output cycles (individual in the middle, the average of three subjects at the bottom) track the input cycle at the top beautifully. But, paradoxically, the pigeons worked harder when the reward was infrequent (low points of the cycle) than when it was frequent (the high points). An older colleague pointed out a possible artifact, but I could find no evidence for his suggestion at the time.

It turned out he was in fact right; I confirmed his idea much later with a better recording technique. Pigeons do track rewards but they track in terms of something called wait time, not in terms of response rate. By the time I found that out, this area of research was no longer fashionable enough for publication in Science.

So why did Science publish what was, in fact, a flawed article? I think there were three reasons: the data were beautiful, very orderly, and without any need for statistics. Second, feedback theory was then very much in fashion and I was trying to apply it to behavior. And third, the results were counter-intuitive, an appealing feature for journal editors wishing to appear on the cutting edge.

Do top journals such as Nature and Science really publish the best work? Are they a reliable guide to scientific quality? Or do they just favor fashion and a scientific establishment, as the two writers in this Times Higher Ed article claim? Nobel Prize winner Randy Shekman, in a Guardian article, along with the many authors whose work is described in a 2013 review article, co-authored by German researcher Björn Brembs, agree that fashion is a factor but point to more important problems. First, painstaking follow-up work by many researchers has failed to show that elite (or what Shekman calls “luxury”), high-rank journals reliably publish more important work than less-selective journals. Brembs et al. write:

In this review, we present the most recent and pertinent data on the consequences of our current scholarly communication system with respect to various measures of scientific quality…These data corroborate previous hypotheses: using journal rank as an assessment tool is bad scientific practice [my emphasis].

Acceptance criteria for elite journals do not provide, perhaps cannot provide, a perfect measure of scientific excellence. Impact factor (journal rank) is an unreliable measure of scientific quality, for reasons I described earlier. Elite journals favor big, surprising results, even though these are less likely than average to be repeatable. Neither where a scientist publishes (journal rank) nor how often he publishes (the length of his CV)—the standard yardsticks for promotion and the awarding of research grants—is a reliable measure of scientific productivity.

The top journals are in fierce competition. Newsworthiness and fashion are as important as rigor. As Shekman says:

These journals aggressively curate their brands, in ways more conducive to selling subscriptions than to stimulating the most important research. Like fashion designers who create limited edition handbags or suits, they know scarcity stokes demand, so they artificially restrict the number of papers they accept. The exclusive brands are then marketed with a gimmick called “impact factor”…. Just as Wall Street needs to break the hold of the bonus culture, which drives risk-taking that is rational for individuals but damaging to the financial system, so science must break the tyranny of the luxury journals. The result will be better research that better serves science and society.

The present system has additional costs: the peer-review process takes time and often several submissions and re-submissions may be necessary before an article can see the light of day. The powerful incentives for publication-at-any-price make for “natural selection of bad science,” in the words of one commentary.

Efforts to change the system are underway. Here is a quote from a thoughtful, if alarmingly titled, new book on the problems of science by Richard Harris, a science correspondent for National Public Radio: Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions:

Take, for instance, the fact that universities rely far too heavily on the number of journal publications to judge scientists for promotion and tenure. Brian Nosek [who is trying to reform the system] said that when he went up for promotion to full professor at the University of Virginia, the administration told him to print out all his publications and deliver them in a stack. Being ten years into his career, he’d published about a hundred papers. ‘So my response was, what are you going to do? Weigh them?’ He knew it was far too much effort for the review committee to read one hundred studies.

Clearly, change is needed. Science administrators can change right away: less emphasis on quantity and place of publication, and much more attention to what aspiring researchers’ papers actually say.

The way that science is published should also certainly change. But exactly how is difficult to discern: open publication (there are a few examples), substitute commentary for formal review, encourage longer, more conclusive—or shorter, but quicker to appear—papers…

New practices will certainly take time to evolve. What they might be is a topic for another time.

Larry Chavis and the Decay of the Academy

Is a New Campus Speech Initiative For Real?

Science Needs Honesty, Not Affirmative Action