Making Sense of the “Do Nudges Work?” Debate

The concept of nudge has proved wildly popular since Richard Thaler and Cass Sunstein published their book of the same name in 2008. Over the past 14 years, hundreds of studies have been published that could be categorized as testing nudges (although the term has been applied quite loosely). 

Back in January 2022, the journal PNAS published an attempt to combine these nudge studies into a meta-analysis. This analysis concluded that the studies had an overall effect size of d = 0.43. That’s fairly substantial—it’s about the same as the effect of interventions to increase motor skills in children or the effectiveness of web-based stress-management interventions.

The study also cautioned that there was some evidence of publication bias. This problem, which affects many scientific disciplines, is where academic studies are consistently more likely to be published if they show a particular result. In this case, “the true effect size of interventions is likely to be smaller than estimated,” the authors write.

Last week, PNAS published a response to this study that went much further. It claimed that properly correcting for publication bias actually eliminated any overall effect of nudges. The short article is titled “No evidence for nudging after adjusting for publication bias,” and some people have used this as the basis for a sweeping dismissal of the nudge idea as a whole.

The tendency to do this may be understandable, given the attention that the idea has generated. But the real situation is more complicated, interesting, and hopeful than this wild oscillation between “big effects” and “no effects” suggests.

The real situation is more complicated, interesting, and hopeful than this wild oscillation between “big effects” and “no effects” suggests.  

It’s pretty clear that publication bias exists. Indeed, my initial reaction to the original meta-analysis was that the overall effect size was “implausibly large,” as another review put it. However, the story I’ve given so far is missing a crucial piece: we have strong evidence that these kinds of interventions do have real effects in real-world settings. Evidence where publication bias has been taken off the table. 

Around the same time as the original meta-analysis, the leading journal Econometrica published a unique study of the work conducted by two prominent behavioral science organizations—the Office of Evaluation Sciences in the U.S. federal government, and the U.S. office of the Behavioural Insights Team, where I work. The study was unique because these organizations had provided access to the full universe of their trials—not just ones selected for publication.

Across 165 trials testing 349 interventions, reaching more than 24 million people, the analysis shows a clear, positive effect from the interventions. On average, the projects produced an average improvement of 8.1 percent on a range of policy outcomes. The authors call this “sizable and highly statistically significant,” and point out that the studies had better statistical power than comparable academic studies.

So real-world interventions do have an effect, independent of publication bias. But I want to use this study to disrupt the “do nudges work?” debate, rather than stacking it on one side or the other. Let’s start.

We can start to see the bigger problem here. We have a simplistic and binary “works” versus “does not work” debate. But this is based on lumping together a massive range of different things under the “nudge” label, and then attaching a single effect size to that label.

An important piece of context is that the two organizations were mostly limited to low-cost, light-touch interventions in the period studied. BIT Americas was specifically tasked with running low-cost, rapid randomized trials, for example. Cheap and scalable changes that produce these kinds of improvements are valuable for those making decisions in the public and private sectors. Hence the growth of organizations that offer them.

But behavioral science interventions can also have bigger impacts at a larger, structural level. Some of these may qualify as nudges, as when changes to defaults produce systemic effects in both the public and private sectors. Others are about applying behavioral science evidence to enhance core policy decisions around regulation, taxation, or macroeconomic policy. These examples of behavioral public policy get less airtime, partly because presenting specific experiments is neater and more compelling.

We can start to see the bigger problem here. We have a simplistic and binary “works” versus “does not work” debate. But this is based on lumping together a massive range of different things under the “nudge” label, and then attaching a single effect size to that label. The studies involved cover a bewildering range of interventions, settings, and populations. Even apparently similar interventions will have been implemented in varying ways—and we know these choices have meaningful impacts.

In other words, we have a lot of heterogeneity in attempts to influence behavior. Effects vary by context and within groups. Interestingly, both the pro- and anti-nudge articles admit this—but the headlines about the average effect size are what dominate. This fact strengthens calls for a “heterogeneity revolution,” where we accept that effects vary (because behavior is complex) and stop thinking that only the overall effect matters.   

Such a change would require us to temper the claims (on both sides), make them more specific, and get to a more sophisticated conversation. But we can only do this if we get serious about understanding why certain things work in certain places. And the good news is that we have some clear ways forward.

A much more productive frame is to see behavioral science as a lens used for wide-ranging inquiries, rather than a specialist tool only suitable for certain jobs.

We can use machine learning to produce new, more reliable and precise analyses of how effects are varying by context and group. We can also produce meta-analyses that are more tightly focused on specific interventions or settings (rather than a grab bag of nudges), like the effect of defaults on meat consumption. To do that, we will need better ways of categorizing studies—or to use the more rigorous ones that already exist. And we can gather more systematic knowledge of how design and implementation choices interact with context, including how we can successfully adapt interventions to new contexts. Implementation science knows a lot about this already.    

With this perspective, we can see the latest publication in PNAS as a marker in the evolution of applied behavioral science: more like a halftime buzzer than a “death knell.” The current phase has been dominated by binary thinking that is misleading and doing everyone a disservice. Large, consistent effects versus zero effect. The “individual frame” versus “the system frame.” Nudges versus “bigger solutions.” As Jeffrey Linder puts it, “Saying ‘nudges work’ or ‘nudges don’t work’ is as meaningless as saying ‘drugs work’ or ‘drugs don’t work.’”

BIT will shortly be publishing a manifesto that tries to move us out of this impasse; one of the main proposals is that a much more productive frame is to see behavioral science as a lens used for wide-ranging inquiries, rather than a specialist tool only suitable for certain jobs. Among many other proposals, we will also address how to understand how effects vary by context and group. We hope it forms part of a more nuanced and informed conversation about behavioral science.


Disclosure: Michael Hallsworth is a member of the BIT, which provided financial support to Behavioral Scientist as a 2021 organizational partner. Organizational partners do not play a role in the editorial decisions of the magazine.