In 2011, Stuart Ritchie learned that undergraduate students have psychic powers.
A groundbreaking paper on “precognition” had just been published in a leading psychology journal. The lead author, Daryl Bem, reported a series of scenarios in which participants seemed to be able to predict the future.
In one condition, participants viewed a computer screen with two sets of curtains. Their task was to click on the set of curtains they thought had an image behind it. There was no other information. Bem reported that when the images behind the curtains were normal, everyday objects, like a chair, participants were as successful as a random guess, 50-50. But when a pornographic image was behind the curtain, participants did slightly better than chance. Evidence, Bem wrote, of some innate, evolved, psychic sexual desire, which helped them predict where the image would be.
Precognition made a splash in the media. In one instance, Bem even appeared on The Colbert Report, which dubbed Bem’s discovery “time-traveling porn.”
But Ritchie, a psychology graduate student at the time, didn’t need psychic powers to see into the future of precognition as a psychological phenomenon. He needed the scientific method and a few collaborators.
Ritchie didn’t need psychic powers to see into the future of precognition as a psychological phenomenon. He needed the scientific method and a few collaborators.
Ritchie and two other psychologists decided they would try to replicate one of Bem’s experiments. (In the paper, Bem wrote that he was open to other researchers replicating his work and would provide the software needed to do so.) They chose a word list condition, where participants were shown a set of 40 words and then given a surprise memory test. After the test, participants were shown 20 of the words again, randomly selected from the set of 40. Bem reported that participants were more likely to remember words that showed up on that final list of 20, even though it was revealed only after the memory test. This provided evidence, he claimed, for psychic intuition.
Ritchie and his collaborators each ran the experiment at their respective labs, and they all came to the same conclusion: there was no evidence for precognition.
But it wasn’t that the work failed to replicate that disappointed Ritchie, it was what happened next.
Ritchie and his collaborators wrote up the results of their replication attempt and submitted it to the same journal that published Bem’s original study. A few days after they submitted their paper, the editor of the journal rejected it on the grounds that the journal didn’t publish repeat experiments, regardless of the results.
So much for science being self-correcting. Ritchie and his collaborators ended up publishing their research in another journal, but the message was loud and clear. “The [scientific] community had demonstrated that it was content to take the dramatic claims in studies at face value, without checking how durable the results really were,” he writes in his new book, Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth.
Precognition was a science fiction, but the story proved too good to give up. It turns out that many other scientific stories—across fields including psychology, economics, biology, and medicine—have also been too good to give up. At least, until recently.
Ten years on from psychic undergraduates, a lot has changed in the scientific community. We know about these too-good-to-be-true stories because of the scientists who have turned their focus to understanding how our scientific systems have gone wrong and who have spoken up in the hopes of making them better.
In Science Fictions, Ritchie documents this story, both the failures and the efforts to correct them. Ritchie’s book is a well-written, if at times uncomfortable, account of how things like fraud, bias, negligence, and hype have pulled our scientific systems further and further away from our ideals, but also how we can use science to reclaim them.
I recently spoke with Ritchie about his new book. Our conversation has been edited for length and clarity.
Evan Nesterak: In the book, you document a number of jaw dropping cases of how the scientific process has gone wrong—be it through fraud, bias, negligence, or hype—and the consequences that ensued. What case sticks out to you the most?
Stuart Ritchie: The one that really hits me is the Macchiarini case, the one that I started the chapter on fraud with. [Editor’s note: Paolo Macchiarini is a Swiss-born Italian doctor who rose to prominence by developing a seemingly groundbreaking technique for trachea transplants. Macchiarini claimed that he was able to successfully transplant artificial tracheas into patients, using the patient’s stems cells to coat the exterior of the synthetic trachea to reduce the chances of rejection. Seen as a rising star in the medical field, he was appointed to the Karolinska Institute, a prominent medical university in Sweden. While there, it became clear that he had falsified patient records, and as Ritchie explains below, his web of lies began to unravel.]
What really hit me about that case was that you had a guy who was essentially a con man, even in his personal life. There was a story about him having this romantic affair with a TV producer—taking her all to these places, saying that they were going to get married and that the Pope was going to officiate the wedding—while he was still married to someone else. [Editor’s note: the Vanity Fair article that describes this affair is a wild read.]
He was a clear con man.
And yet, he manages to publish all these papers in The Lancet. He got this job at the Karolinska Institute, which seemed to be because of recommendations from other top professors, some of whom were on the committee for the Nobel Prize in Physiology and Medicine.
After people started asking questions about the data, he was defended. An editorial in The Lancet said Paolo Macchiarini is not guilty of misconduct when clearly he was, which was confirmed by an independent investigation. The Karolinska Institute stonewalled and even phoned the police on the whistleblowers, because they said they were violating the privacy of the patients. All the while, people had died in these operations that Maccirarini had done with the trachea transplants.
I think this is an extreme case of a scientific fraudster who’s writing papers, which do not tell the truth about what he had done, and it had serious consequences. It killed people. These transplants just did not go right. They were not being reported in the way they should have been. And you also had institutional screwups where they covered up the problem for a long, long time, even after it was it was fairly clear that something had gone terribly wrong.
The Macchiarini case was surprising to learn about. But what do you say to someone who says that Macchiarini was an outlier? So what if some academic studies are biased or unreliable? Sure, outright fraud is bad, but on the whole, this doesn’t affect me, why should I care?
The fact that this happens means that scientists not only can’t rely on the data or the papers that are published in the literature but also can’t rely on their own colleagues. If you’re a scientist, you need to be aware that many people have been caught out by their own colleagues committing fraud, throwing all of their research into doubt.
An example I give in the book is Michael LaCour. His coauthor, Donald Green, who’s a respected political science professor, didn’t collect any of these data himself and didn’t commit the fraud; he was completely innocent of it. But LaCour served up all these amazing looking data sets. Green was really excited and they published in Science. Then, of course, it was discovered that the data were fraudulent.
If this question was asked by a nonscientist, they should be concerned because science is supposed to be the place where this stuff does not happen. It’s all about telling the truth and being scrupulous and methodical. Yet you have cases where not just individuals but institutions are either committing seriously immoral acts or dragging their feet in trying to correct them.
It happens across all different sciences as well. You have this whole cottage industry of people who are constantly emailing universities saying, I found image duplication [a sign of made-up data] in the paper by one of your scientists, what are you going to do about it? In the universities, it disappears into the bureaucracy for a while, and sometimes something is done, sometimes it isn’t. But I get impression that the universities are not open enough about dealing with problems like this and that’s something everyone should be concerned about.
Let’s dive more into some of the costs of bad science. Can you take us through the ways that a bad study, whether fraudulent, ill-conceived, or just poorly executed, can set up a cascade of waste?
It starts with the waste in the initial study. Often, it’s based on grant income for someone’s salary or a grant that’s for the specific experiment. If that specific experiment is not reported accurately, then it’s going to waste the money that was involved directly for that.
When the paper is published, obviously other scientists want to review it. First of all, they want to include it in things like meta-analyses. If the original study was a false positive or had fraudulent data in it, it’s going to throw off the results of that meta-analysis.
There’s the case that I discuss in the book of Joachim Boldt, who published fraudulent studies on blood expanders in surgery. When they were included in a meta-analysis, it looked like blood expanders were leading to better outcomes and patients were surviving more often from surgery. But once you take his fraudulent studies out of the meta-analysis, it turned out that the blood expanders were worse and patients were dying more often.
This is a case where having those fraudulent studies in the literature may well have killed people, because everyone relies on a good meta-analysis—they’re meant to cover all the research and give us an accurate estimate. But of course, overhyped, overblown, larger-than-reality effect sizes are also going to be found in studies that are not fraudulent but biased in some way, p-hacked.
Based on these overhyped, biased, flawed, sometimes fraudulent studies, you can end up building entire scientific fields that are based on nonsense.
Another cause of waste is that other scientists don’t just want to review a paper, they want to follow up on it with their own research. That’s going to lead to waste in terms of them trying to follow up on research where the results were never real to begin with. Or they’re following up on small studies, under-powered studies in the statistical power sense. Because the original study looked like it found an effect with a sample size of 30 people, they then follow up on that study with a small sample size, thinking that the effect is much bigger. So you get your meta-analysis wrong, you get your sample size wrong.
One of the cases I discuss in the book is the candidate gene literature. You had a whole literature, an entire edifice of studies, on this idea of candidate genes. In behavior genetics or psychiatric genetics, the thought was that there were just a handful of genes that might explain quite a lot of the variation in behavior. And that’s been completely overturned by the genome-wide association study literature, which has shown that there are many genes, sometimes tens of thousands of them, that may explain behavior, not just a handful.
But you wouldn’t have known that if you’d looked at the literature in the early 2000s, up to about maybe 2009-ish when people started to do genome-wide association studies in larger sample sizes. I remember being taught that when I was at university. I remember all this excitement about the candidate genes, where a gene that explains 10 or 20 percent of the variance in people’s memory skills, say, and that stuff was just never real.
Based on these overhyped, biased, flawed, sometimes fraudulent studies, you can end up building entire scientific fields that are based on nonsense. The scary thing I mention in the book, and that was mentioned in the blog post on Slate Star Codex that I reference, is that the genome-wide association studies came along in behavior genetics and showed that all the previous stuff was wrong. What if the equivalent style of study doesn’t come along for other fields that we might be relying on? We know what went wrong with the candidate gene studies, but we don’t know for other studies for other fields of research. That’s a scary thought.
I want to shift gears a bit to two of my favorite terms (if you can call them that) that you discuss in the bias section: HARKing and outcome switching. They just feel like such human mistakes—we tell ourselves a story of why something came out the way it did, even though that story may not be true. But they’re mistakes that are nonetheless getting us farther away from the truth. Can you define what those terms are and how they work their way into the scientific record?
HARKing is hypothesizing after the results are known, and it occurs in cases where people have not properly defined their hypotheses before they collect the data, or where they change those hypotheses afterwards.
Outcome switching is similar except that term is often used in the medical trial literature. So it’s often the case that medical trials will be set up to study one thing, like maybe the effectiveness of a headache pill, but the pill ends up reducing people’s anxiety. [The researchers then publish a study as if they were expecting the pill to reduce anxiety from the start.]
They both stem from scientists not being locked into their plans before they start the study.
I think if you stopped someone in the street and said, “Do you think scientists plan every hypothesis and plan every analysis and then when they get the data they stop? Or do you think scientists vaguely have an idea of what they’re looking at and then it can easily change once they look at the data?” I think people would think that it’s the first one, and they would be outraged to learn that actually, often it’s the second.
As you say, it’s very human mistake. We get our data sets and they don’t quite show us what we want. And so we think, “Well, you know what, I should have hypothesized that maybe it’s not X, but it’s Y, and Y is close enough to X.” I can kind of convince myself that that was what we were looking for in the first place. It’s easy enough to get yourself into thinking that’s maybe even a more important question—that Y is actually really interesting, really important.
These are ways that we can post hoc rationalize ourselves into thinking that we did the right thing. In that sense, they’re actually scarier than fraud. They’re less in your face or immoral than fraud. But they’re scarier, because they’re more insidious. They are unconscious biases that can creep into the science of people who are completely well meaning.
The book focuses primarily on the strong (and often warped) incentives in the academic setting, but there are other research settings that have their own set of strong, and potentially corruptible, incentives, like in policy or industry. Having your research come out how you hoped might mean another contract with a client, getting a policy approved, or simply feeling like all the time, energy, and money you spent wasn’t for nothing. I’m curious, what are your thoughts on how the scientific process shakes out in these nonacademic settings?
I think that’s a really interesting question. It’s something you don’t hear that much about because it’s not really seen as part of this whole scientific world where scientists are talking about the latest papers and so on.
I think there’s a parallel between working in the lab of a principal investigator who really wants results and working in a company that really wants results. You’re still feeling pressure. And often it’s implicit. It’s a kind of pressure to find results.
Take the way we talk about “failed” studies. I have this with my students, where they go and do an analysis and they come back and say, “Oh, I’m really disappointed, I didn’t get a significant result in this case.” I always have to say to them that it’s not about being disappointed. They’ve done the research correctly, and it should have been set up in such a way that we learned something, even if they didn’t find a positive, statistically significant result.
There are these implicit, low-level incentives of wanting to find something that works, wanting to find something that helps people, wanting to advance the science in some way. And that happens whether you’re in academia or outside of it.
There are these implicit, low-level incentives of wanting to find something that that works, wanting to find something that helps people, wanting to advance the science in some way. And that happens whether you’re in academia or outside of it. You feel like you want to make a difference.
Add on top of that, that if you make a difference, you’re more likely to get promoted, you’re more likely to get a job, you’re more likely to, in academia, get publications, and so on.
So I think there are all these things—again, very, very human—pushing us toward finding positive results.
What advice would you give to applied behavioral scientists working outside of academia who want to adopt open science strategies? Their fields likely have their own set of warped incentives. How could they adopt open science strategies? Is it essentially on them to create and monitor themselves, or are there systems and structures you’ve seen work for research conducted outside of an academic setting?
I think it’s interesting that psychology is having this kind of massive increase in pre-registration, registered reports, and all of these ways of setting down your analysis before you touch the data. But they learned that from a field that is very much conducted by industry—clinical trials. The first registration of studies in this sense were done in clinical trials.
Now, of course they were forced to do that by governments. You have the ClinicalTrials.gov website in the U.S., and there are various other ones for different countries, where by law you have to register your clinical trial. And if you’re going to go for publication, you have to register there, because journals won’t accept your clinical trial if it hasn’t been registered there.
This has now been adopted in a lot of psychology, although nowhere near enough, but it is becoming adopted. I think industry can then learn from that, right? They can use the same tools that are being used by psychologists.
For instance, the Open Science Framework is open to anyone. Anyone can pre-register their analysis plans there, anyone can share their code there, anyone can share their data and the materials that they use for the experiment. Anyone can make the whole process open and transparent in the same way that scientists are.
I think pre-registration really is one of the things that will kill a lot of these problems. I’m well aware that there are problems with pre-registration, people don’t always follow their pre registrations, but it’s not enough to collect the data and then say “I’m going to analyze data in an open and transparent way.” That’s great, but the horse has bolted in that case.
Before you’ve collected any data, just writing down everything that will be done and setting a kind of if then rule for if the data go this way, then you will conclude this, if the data are this way, then you will conclude this. And getting everyone to agree before any data collection has happened, to write their name on a pre-registration that you post publicly.
There are objections to this, because people worry that they won’t be able to explore the data, they won’t be able to dig into it and get the most value out of their data. I think that’s misguided because you can simply have a section in your write up that identifies the stuff you didn’t pre-register. This is the stuff we thought of after we looked at the data, and I think that’s completely fine. It’s totally fine to do exploratory analysis of your data.
The problem is that, both in academia and in industry, everywhere you look, it’s always written up as if we had this idea, we then went and tested it, and lo and behold the data support our hypotheses.
The discussion of these failures of the scientific system comes with a lot of negatives. So I want to ask you, as you researched the book and as you’ve worked on open science, what has been a bright spot for you?
The really positive and hopeful thing is that all these changes, all these reforms, the highlighting of the problem in the first place, have come from within science.
One of the most hopeful and positive and optimistic things I’ve done in the past few years is go to the Society for the Improvement of Psychological Science (SIPS), which is this group of open science, open psychology researchers. It started off as a relatively small conference and it’s gotten much, much bigger.
The really positive and hopeful thing is that all these changes, all these reforms, the highlighting of the problem in the first place, have come from within science.
They have this conference where they’re sharing all the latest technologies and tools that can be used to share your data to make your work more open and transparent, and also coming up with entirely new ideas and discussing them and sharing them with each other. For instance, should you have a co-pilot scientist who runs your code to make sure it works rather than just hoping that you got it right? How should we be working together to collaborate on things like the psychological science accelerator where people are linking up labs across the world to work on registered replications?
You have to remind yourself at the end of the conference that this is a small number of people in comparison to the number of psychologists and behavioral scientists in the world. And we’ve got a huge amount of work to do to try and bring them on board and make it easy for them to use all these tools and technologies to make science more open.
The existence of SIPS and things like the ReproducibiliTea workshops in different universities, open science groups popping up everywhere, funders getting on board with open science and recognizing the problem—I do see real changes happening within the last few years that make me optimistic that a lot of these problems in science can be dealt with.
Our conversation, like the book, covered some dismaying ground. Nevertheless, Ritchie remains optimistic and committed. He concludes in Science Fictions:
“In spite of the perverse incentives, in spite of the publication system, in spite of academia and in spite of scientists, science does actually contain the tools to heal itself. It’s with more science that we can discover where our research has gone wrong and work how to fix it. The ideals of the scientific process aren’t the problem: the problem is the betrayal of those ideals by the way we do research in practice. If we can only begin to align the practice with the values, we can regain any wavering trust—and stand back to marvel at all those wonderous discoveries with a clear conscience.”