In the fall issue of Public Opinion Quarterly in 1949, sociologist Paul Lazarsfeld pulled one of my favorite social science head fakes of all time.
Lazarsfeld was tasked with reviewing the first two volumes of an expansive four-volume book on the social science research conducted by the Army during WWII. The book, appropriately named The American Soldier, synthesized some 600,000 interviews and 300 studies that the Army had conducted during the war. “Never before have so many aspects of human life been studied so systematically and comprehensively,” Lazarsfeld comments in his review.
But there’s a catch, Lazarsfeld tells his readers. To fully understand and appreciate the findings in The American Soldier, one needs to know how social science research differs from that of other sciences like chemistry or physics. “That bodies fall to the ground, that things are hot or cold, that iron becomes rusty, are all immediately obvious,” he explains. Understanding why people do what they do, not so much. “The world of social events is much less ‘visible’ than the realm of nature.”
This intangibility leads to a problem. “It is hard to find a form of human behavior that has not already been observed somewhere,” he writes. “Consequently, if a study reports a prevailing regularity, many readers respond to it by thinking ‘of course that is the way things are.’”
It’s at this point that Lazarsfeld shares a few of the study’s findings. Here are three:
- “Better-educated men showed more psycho-neurotic symptoms than those with less education. (The mental instability of the intellectual as compared to the more impassive psychology of the man in the street has often been commented on.)
- Men from rural backgrounds were usually in better spirits during their Army life than soldiers from city backgrounds. (After all, they are more accustomed to hardships.)
- Southern soldiers were better able to stand the climate in the hot South Seas islands than northern soldiers. (Of course, southerners are more accustomed to hot weather.)”
Except none of this is true.
The researchers actually found the exact opposite. It was those with less education showed more psycho-neurotic symptoms; those from cities who were in better spirits; and southern soldiers weren’t any better at dealing with the hot climate than those from the north.
Lazarsfeld’s point is that once we know a result, it’s far too easy for us to make up a story as to why the results came out as they did. In other words, hindsight bias—that we knew it all along—skews our judgement.
“If we had mentioned the actual results of the investigation first, the reader would have labelled these ‘obvious’ also,” Lazarsfeld writes. “Obviously something is wrong with the entire argument of ‘obviousness.’”
“Obviously something is wrong with the entire argument of ‘obviousness.’”
Although Lazarsfeld was writing 70 years ago, social and behavioral scientists have just begun to take on the problem of “obviousness” in a systematic way. Writing in Science recently, economists Stefano DellaVigna, Devin Pope, and Eva Vivalt announced the launch of a new social science prediction platform, which will systematically collect predictions of research results before a study begins.
It’s significant because, currently, there’s no systematic way to understand scientific views prior to an experiment or know how scientists update their views after new results comes in.
“Informally, people routinely evaluate the novelty of scientific results with respect to what is known,” DellaVigna, Pope, and Vivalt write. “However, they typically do so ex post, once the results of the new study are known. Unfortunately, once the results are known, hindsight bias (‘I knew that already!’) makes it difficult for researchers to truthfully reveal what they thought the results would be.”
So the hindsight bias that Lazarsfeld was trying to mitigate rhetorically can now, with the prediction platform, be addressed scientifically.
The platform, currently in its beta phase, works like this: researchers post a 5- to 15-minute prediction survey on the platform and users, or forecasters, predict the results. Most predictions will come from researchers, like professors and graduate students. For some projects though, they’ll want predictions from policymakers, practitioners, and the general public. The results of the prediction survey will serve as baseline before a study begins. Researchers can wait until their study’s results come in to see how the forecasts line up with the actual data. Or researchers can access the predictions before running their study; if there’s a lot of uncertainty on a particular point, it may be worthwhile to make sure it gets measured.
These prestudy changes in particular, DellaVigna, Pope, and Vivalt believe, can help researchers improve their experimental designs and make better use of their limited resources. Vivalt explained that a researcher might want to “focus on doing studies where there’s the most uncertainty or the highest value of information … the highest likelihood of changing a policy decision based on the results.” That’s not something you can do without understanding people’s prior beliefs.
DellaVigna, Pope, and Vivalt also argue that the platform, which they developed in partnership with the Berkeley Initiative for Transparency in the Social Sciences, will help reduce publication bias by revealing when null results are expected versus when they’re actually novel.
“Right now journals are loath to publish results that are not significant,” Vivalt told me. “Whereas actually, an insignificant result could be really interesting and informative, if everybody thought it was going to be significant.” Without systemically collecting predictions we miss out on knowing when this is the case.
Predictions could help reduce publication bias by revealing when null results are expected versus when they’re actually novel.
The new platform joins several other new initiatives that aim to use predictions to improve the science.
The United States’ Department of Defense DARPA program (Defense Advanced Research Projects Agency) recently launched SCORE (Systematizing Confidence in Open Research and Evidence). The goal of SCORE is to understand what findings in the social and behavioral sciences are reliable, how much confidence or uncertainty there is for a particular finding, and, eventually, automate this assessment process. (The ultimate aim is to help Department of Defense employees understand and apply social and behavioral science and research in their work.)
SCORE has funded several projects to help this process, one being Replication Markets, where forecasters bet on whether a finding will replicate. A subset of these findings is then replicated, in partnership with the Center for Open Science, who developed the list of 3,000 findings to put on the market. If you bet correctly, you earn real money.
Replication Markets, and prediction markets more generally, work a bit differently than the DellaVigna, Pope, and Vivalt platform described above. The primary difference is that in a market, forecasters know what others are predicting. In the new prediction platform, they’re blind to others’ predictions. Another difference is that the new platform is not focused exclusively on replication, so there’s the opportunity to get predictions on new research.
Despite their differences, both approaches aim to help scientists understand how forecasts of research can help improve the scientific process and, ultimately, how we apply scientific findings.
Brian Nosek, who heads the Center for Open Science, is optimistic about what both prediction platforms and markets can offer.
“A lot of science is about prediction, and yet we don’t do it very explicitly or very consistently. It’s easy to get caught in a trap of motivated reasoning to reconstruct outcomes as things we predicted,” he said. “Just the existence of platforms like this provides occasion for researchers to start to really wrestle with the extent to which they have predictions, what those predictions are, and recognize that we have a lot more uncertainty in our predictions than we might realize.”
“A lot of science is about prediction, and yet we don’t do it very explicitly or very consistently. It’s easy to get caught in a trap of motivated reasoning.”
Several studies have already demonstrated the value of collecting predictions, particularly about who to get predictions from. One by DellaVigna and Pope collected predictions for 18 experimental treatments. They found that experts were generally pretty accurate. They also found that Ph.D. students tended to do the best, and that highly cited researchers didn’t outperform other researchers. Another, by Alain Cohn and colleagues, found that nonexperts and expert economists weren’t great at predicting when someone might return a lost wallet. Knowing who the best predictors are for what topics is an open question, and predictions platforms can help investigate.
There are risks and unknowns with how the push for systematic predictions might influence scientists.
“The fact that everyone agrees on something doesn’t mean that it’s correct,” Nosek said. “So you still need to have a research environment that appreciates and validates the efforts of iconoclasts.”
He also said that reducing a prediction to a score could prove to be a problem. How would scientists compare something that is, say, a 92 versus say an 85 or a 74? Weighing this kind of uncertainty is something we tend to be pretty bad at.
Predictions also take time. Will they burden the research process?
Nosek doesn’t think so. He puts it in the same camp as other recently developed open science practices, which, he says, have helped reinvigorate researchers.
“One of the things that’s been really interesting in the adoption of these behaviors, like pre-registration, is that we could understand it in one way as a big cost, right?” he said. “But many people, after having done it, will report things like, This is this is what I thought science was when I got into this. This is so fun and exciting.”
Not to say that it’s easy. “Putting those commitments down, making them explicit, is hard. [But] it is interesting, it is theoretically generative,” Nosek said. “We have this exciting moment when we get the results and we get to see we were totally wrong, or you were right and I was wrong. But whatever it is that happens. It adds some energy to the research process.”
On balance, systematically collecting predictions seems like it has a good chance of strengthening the scientific process. The problem of obviousness isn’t going away, so trying to find a way to solve it feels, well, obvious.