The Behavioral Scientist as a Cartographer

What is the elevation of Canada? In the month of July, what is the temperature in Australia? What is the average daily rainfall in the United States?

Initially, these questions might appear reasonable. But try to answer any of them and you’ll recognize that they are essentially meaningless. Take Canada—the fact that its average elevation is 1,598 feet does not really mean much. Canada has towering mountain ranges in the west, cresting at Mount Logan in Yukon with an elevation of 19,551 feet, low lying prairies in Alberta, and coastal regions in the east that sit at sea level, all spread across more than 3,000 miles.

Focusing on averages obscures the meaningful variations that define the Canadian landscape. Knowing the average elevation doesn’t help you; it won’t tell you what kind of building you should construct on your lot, what kind of crops to grow on your farm, or whether to pack hiking boots or walking shoes when you plan a vacation. Knowing the average temperature of Australia in July or the average daily rainfall in the United States is equally unhelpful.

Now consider several questions that might be asked of an applied behavioral scientist: What is the average effect of a nudge? What is the most effective intervention to boost uptake of a new app? Can we use loss aversion to get more people to save for retirement? Should we use framing or social proof to encourage people to vote?

Why do we keep searching for the absolute or average success of behavioral interventions? And what can we do instead?

These questions are also essentially meaningless. Knowing whether a particular nudge (say, a reminder) works on average doesn’t help you. It won’t tell you whether to deploy the same nudge through text messaging or a physical mailing; whether the same reminder will be equally effective for all audiences; or what the cumulative impact of the nudge will be. 

The efficacy of behavioral interventions varies across populations and situations. This we know. Why then do we keep searching for the absolute or average success of behavioral interventions? And what can we do instead?

Key ideas from cartography might allow us to reframe the question. We can shift from asking, “Which intervention is best?” to, “Given the conditions I am operating under, what intervention holds the most promise?”

Why do we keep searching for absolute and average success?

One reason the field has focused on absolute and average success may have to do with how it started. In the early 2010s, the field rapidly gained popularity after the publication of Nudge and the early success of several prominent behavioral science units. As other scientists, policymakers, and organizational leaders took note and started their own units, it seemed like a no-brainer to adopt interventions that early pioneers had already found to be effective.

However, it wasn’t that simple. Several recent books have documented how promising interventions didn’t translate across contexts or failed at larger scales (in what John List refers to as “voltage drops”).

In hindsight, we might have been too quick to adopt blockbuster interventions elsewhere, too hesitant to adapt successful interventions to local contexts, and too hasty to reject intervention ideas that failed elsewhere but might have worked in a different context.

Much like the elevation of Canada varies depending on latitude and longitude, the effectiveness of any behavioral intervention also varies across several dimensions of context.

Many of these dimensions first came on the radar through Richard Thaler’s work on SIFs—supposedly irrelevant factors in decision-making and behavior. A SIF is any aspect of the context in which a decision is made that should not affect the outcome of the decision but does. For example, the default choice in an online shopping portal, whether a price is listed as an aggregate amount or as a per-day expense, or the medium used to present a choice (e.g., online vs. in-person) have all been shown to affect people’s decisions.

Building on Thaler’s work, researchers have documented a number of new SIFs, with increasing nuance, such as the timing of reminders or whether calls to action are worded as statements or as questions. Recently, my colleagues and I identified key elements of context that can drive the success of behavioral interventions (see the table below for a subset). We’ve categorized these elements into features of the target population and features of the situation in which the intervention is delivered.

The question is: How can we design and adapt interventions effectively, despite the variability of context?

The importance of relief in behavioral science

Relief mapping in cartography offers a valuable analogy. Relief mapping depicts the variations in elevation that characterize a landscape. In the figure below, Panel A is a high relief landscape in which there are dramatic differences in elevation. High relief landscapes have towering mountains and deep valleys in relative proximity. In contrast, medium (Panel B) and low (Panel C) relief landscapes are relatively flat. They could be the prairies at sea level, or they could be a plateau at some elevation above sea level. In essence, the relief of a landscape captures the degree of variability in that landscape.

What if behavioral scientists could identify the relief of the behavioral landscape by working like cartographers? A simplistic interpretation of the phrase “behavioral scientist as cartographer” is a literal one—that they should test for the effectiveness of interventions across geographies and cultures. Others have used the term cartography to refer to a visual catalog of existing successful nudge interventions. However, I propose the term in a deeper metaphorical sense. For any intervention, behavioral scientists could map the landscape by plotting the effectiveness or strength of the intervention—akin to the elevation of a geographic landscape—as a function of two (or more) elements of context. Just as a relief map captures the variability of a landscape as a function of latitude and longitude, a behavioral relief map will capture the variability in the effectiveness of an intervention as a function of contextual dimensions.

If there is great variability as a function of these contextual elements, we would get a high relief landscape like Panel A. We might refer to the behavioral phenomena in Panel A as “brittle” or “fragile.” A change in contextual location (e.g., from X to Y in Panel A) will dramatically change the effectiveness of the intervention. In contrast, we might refer to the behavioral phenomena in Panel C as robust; moving from X to Y does not change the effectiveness much. In the Panel C world, interventions more readily translate from context to context while in the Panel A world, translation challenges will be high.

What if behavioral scientists could identify the relief of the behavioral landscape by working like cartographers?

The relief mapping proposed here is not radically different from what academic researchers are familiar with. In experimental research, the term moderation is used to refer to changes the target effect, as a result of a contextual variable.

While both academics and practitioners might currently be sensitive to the important role of context, many simply shrug off differences in intervention results across two situations with a “the context was different,” rather than taking the proactive and systematic approach offered by relief mapping.

Challenges in translation and scaling

What would relief mapping offer us? If researchers and practitioners had a sense of the relief of their landscapes, they would be better equipped to know when interventions need to be adapted rather than copied.  

For instance, an intervention that successfully increased retirement savings in the United States did not readily translate to Mexico because of cultural differences around money, such as the greater emphasis placed on a family’s financial situation relative to an individual’s. Likewise, another intervention tried to boost retirement savings by simplifying the information on quarterly pensions statements. The simplified statements increased people’s attention to their fund’s performance, boosting saving for those with well-performing funds but reducing saving for those with low-performing funds.

The steep relief of the financial landscape in these examples made direct translation problematic. However, if we knew the intervention was successful in a high relief world (where the researcher metaphorically landed on a mountain), where individual difference and contextual situational factors play a significant role, we’d know that attempts to replicate it in a different setting (where the practitioner metaphorically lands in a valley) may be likely to fail.

This raises an important question: How should we deal with the evidence we do have when the science itself predicts fragility? At one extreme, one could make the argument that it is impossible to use past evidence because the likelihood is infinitely small that every element of context will be identical between the past and present settings. However, completely ignoring the existing evidence and relying on collecting new evidence for every intervention is both impractical and prohibitively expensive. (At the other extreme, we could defer completely to existing evidence and dismiss fragile findings as ineffective.)

So as researchers and practitioners consider the behavioral relief of their landscapes, we can ask what each can do differently to overcome the challenges they face translating, scaling, and adopting or adapting interventions.

What can researchers do differently?

Those who produce evidence—academic and applied researchers—frequently publish papers that inform how practitioners develop interventions. To help practitioners translate interventions, researchers should not only report the effectiveness of the intervention but also attempt to more thoroughly assess the generalizability of the intervention and report the precise context in which the studies were conducted. While some contextual details are provided in the text of academic papers, it would be helpful to develop a standard checklist based on the key elements of context and require authors of published studies to report on the standard. This checklist might take the form of the CONSORT guidelines typically used in the reporting of randomized trials.

Research also suggests that media reports of successful interventions tend to ignore the nuanced, contextual details, which can create an illusion that the research is more generalizable than the original authors claim it to be. This is particularly problematic given publication bias—the tendency to publish only the positive findings and relegate null effects to the file drawer. Additionally, researchers themselves should avoid overclaiming and resist pressures to do so. 

If researchers and practitioners had a sense of the relief of their landscapes, they would be better equipped to know when interventions need to be adapted rather than copied.  

In essence, in addition to documenting behavioral effects and the success of interventions, producers of evidence need to better map the terrain, reporting on the relative relief (when evidence is available) or at least speculating on contextual variables that might change the results (when evidence is unavailable). This will sensitize the practitioner to use the appropriate level of caution in simply adopting rather than adapting the intervention. Practitioners are thoughtful, often entrepreneurial forces in their work, but the success of applying work is hindered by lack of context that they can weigh. Ideally, the field should develop a measure of (actual or judged) relief of each of its phenomena.

What can practitioners do differently?

Practitioners should adopt a Reaganian “trust but verify” approach when borrowing behavioral interventions. When relief is high, practitioners need to verify whether the existing intervention will translate and be open to making significant adaptations. Conversely, if the relief is low, then the burden of evidence needed to verify whether an intervention could translate will be lower. In the (admittedly unlikely) extreme case when the terrain is absolutely flat and there are no variations, no additional verification would be needed.

I encourage practitioners to use the elements of context checklists to determine how similar their contexts are to the context in which the original intervention was successful. “The Elements of Context” report provides two lists of variables that could change the effectiveness of an intervention (e.g., communication and media used, physical and social environment at time of intervention delivery) and the differences in target population (e.g., demographics, communication styles, psychological factors). These checklists were developed by combing through published research documenting differences in intervention effectiveness, as well as through expert interviews with experienced applied behavioral scientists.

I also encourage practitioners to read the original papers and reports and not just rely on media reports given the finding that media reports tend to strip away contextual details and make findings seem more generalizable than they are.

Charting the path ahead

Just as cartographical relief mapping provides a nuanced understanding of terrain, behavioral relief mapping can help us understand and predict when certain interventions are likely to succeed in a different context. Just as asking about its average elevation flattens the variability in Canada, asking for an absolute effectiveness of an intervention flattens the variabilities in context. By embracing the complexity of behavioral relief, we can ensure that interventions are both effective and adaptable across diverse settings.