The Study Premortem: Why Publishing Null Results Is Only the First Step

Talking about failure is increasingly in vogue. People have started posting “anti-resumes,” listing their rejections rather than achievements (for an academic example, see here). Around the world, people are encouraged to share their failures, including through videos or at public-speaking events.

This movement is laudable: by speaking up about how frequently we fail, we are shifting the narrative, allowing people to be empowered to talk about situations when they have failed, and allowing us to learn from them.

One way that behavioral scientists have become swept up in this movement is by publishing more and more null findings of their studies. For example, one recent large-scale field study in Guatemala found that honesty primes did not reduce cheating on tax returns, despite having been effective in other contexts. Another study showed that preselecting an environmental option did not make it more likely that it was chosen, although similar studies had found that preselecting options increases the uptake of those choices.

Publishing these studies is important. If we only publish our successes and not our failures—a tendency that has been referred to as the “file-drawer problem”—then the literature provides only a very skewed picture of what’s actually going on, what’s known as publication bias.

But I think that something is amiss in the current effort to publish null results. They all take a postmortem approach, trying to document the failure to find an effect after the study has already been conducted.

There is no “failed” study—just a failure to learn from a study.

What if we conducted premortems of our studies? That is, what if we designed studies—especially large, expensive field studies—that, should they fail, could nevertheless shed light on why they may have failed? While publishing null results is a great practice to tackle the file-drawer problem, such publications are rarely designed to shed light on these underlying causes, inviting criticism from original authors who, for example, note that important preconditions for their theory were absent or insufficiently operationalized.

A study premortem can help change that. The concept of premortems was coined by applied psychologist Gary Klein. It suggests that before launching a project, people should imagine a scenario in which their project fails. They then backtrack from that counterfactual. How might that failure have occurred? Engaging in these, writes Richard Thaler, a proponent of premortems, allows people to consider how they may have to change their project to either minimize the risk of that failure happening or have a backup plan in place.

Note that this contrasts with how people usually think: we have a tendency to believe that our project is likely to succeed, and then we effortlessly generate all the reasons why it will be successful. Instead, by imagining that the project will fail, and thinking about all the reasons why it might fail, premortems force us to consider things from a less biased perspective and reduce the overconfidence that accompanies most projects.

These premortems, I propose, could become a crucial step in the design stage of studies, especially large, expensive field studies. At the moment, most field studies are not set up to test why they might not find a statistically significant effect. They manipulate some variable (for instance, introducing a new persuasive message), compare it against a control group (e.g., the standard message that had been used previously), and then aim to show that the intervention “works.” When a study “doesn’t work”—when the new treatment has the same effect as the control group (and the difference in effect size is not statistically significant)—commentators often argue that a study has “failed.” I believe this perspective is unhelpful at best and can at times even harm our understanding of a phenomenon.

In contrast, if a study is well-designed—when it is able to provide valuable insights beyond its main results—then there is no such thing as a “failed” study. Just because a study did not get a significant result doesn’t mean there is nothing to learn. Whereas the goal of a policymaker might be to help, the goal of a (behavioral) scientist is to learn, and no matter the outcome, a well-designed study can offer insight not only into whether an effect has empirical support but also into what it is that makes it work (or not). Put differently, there is no “failed” study—just a failure to learn from a study.

We need to start designing studies that allow us to understand why they failed, and premortems can help us get there.

Of course, this is sometimes easier said than done. Who would have thought that requiring bike helmets in Australia would lead to an increase in the proportion of head injuries? It turns out that the helmet requirement, turned a lot of people away from cycling. There was a reduction in head injuries after the law was implemented, but the proportion was far less than one would have have expected, given the number of people who were no longer cycling. One possible reason is that wearing a helmet makes people feel safer and, as a result, more willing to engage in the type of reckless cycling for which helmets are no protection. Another reason is that cyclists have strength and safety in numbers; a cyclist’s biggest risk is getting hit by a car. By inadvertently reducing the number of cyclists, the helmet law made it less safe for the remaining cyclists.

Similarly, who would have thought that tightening border enforcement between the United States and Mexico would lead to an increase in immigration? We now know that many Mexicans who worked in the U.S. were seasonal workers who would often travel between the U.S. and Mexico; they saw no need to permanently settle in the U.S. and move their families. With tighter border restrictions, frequent travel became infeasible—and so families moved to the U.S. permanently.

Hindsight is, of course, twenty-twenty. Sometimes even premortems may not be enough, because we are not aware of the factors that are really driving effects in the real world. After all, this is a key component of the scientific endeavor.

But how could we design better studies despite this often critical limitation? We can look to Shell, the petroleum company, for inspiration. One thing Shell is known for is their “Shell Scenarios,” where they ask their own employees, as well as renowned external stakeholders, “what if” questions they believe have only a very low probability of occurring. By asking this question, and playing through the whole scenario, they often discover new strategies they may want to pursue or new data they may want to collect.

I advocate for a similar strategy in study design. Next time you’re developing a large-scale field study, ask yourself: “What if we don’t find a significant result? How could this study still contribute knowledge? Why wouldn’t it have worked, and how would I be able to tell why it didn’t?” These questions could, for example, be included in study preregistration protocols, such as AsPredicted or the Open Science Framework, and help improve the designs of our studies before we conduct them.

I’m beyond thrilled that we can now publish null results. But we need to start designing studies that allow us to understand why they failed, and premortems can help us get there. There’s no such thing as a failed study; when designed well, there is only ever a failure to learn from them.