The Key to Creating Cost-Effective Experiments at Scale

On Friday, NPR’s Invisibilia aired an episode that featured Matt Salganik’s research in collaboration with the Fragile Families & Child Wellbeing Study. Researchers had been collecting enormous amounts of data for decades about families, particularly those with unmarried parents, who were at “greater risk of breaking up and living in poverty.” Salganik issued a challenge to researchers to devise an algorithm that would use this data to better predict important outcomes. The challenge grew out of Salganik’s new book, Bit By Bit: Social Research in the Digital Age—you can read more about how that unfolded, despite the idea of mass collaboration meeting with some initial skepticism, in Salganik's blog post that accompanied the Invisibilia episode.

The entire book—which I believe is a must-read for anyone aspiring to produce (or consume) social science research in the age of “big data”—was written using another Salganik innovation: the Open Review Toolkit, which is a platform that allows crowdsourced reviewing of the book that Salganik convincingly argues improves the quality of the book, leads to more sales, and increases access to knowledge (for more on these, watch Salganik’s blog this week).

In the excerpt below, Salganik makes the case for digital experiments that allow you to collect massive amounts of data by reducing the variable cost of data to zero. That doesn’t mean that the experiments are free to get off the ground. They’re not, but once they’re launched they can be an incredible resource, as illustrated by Salganik groundbreaking work with Peter Dodds and Duncan Watts. - Dave Nussbaum, Managing Editor

Digital experiments can have dramatically different cost structures, and this enables researchers to run experiments that were impossible in the past. One way to think about this difference is to note that experiments generally have two types of costs: fixed costs and variable costs. Fixed costs are costs that remain unchanged regardless of the number of participants. For example, in a lab experiment, fixed costs might be the costs of renting space and buying furniture. Variable costs, on the other hand, change depending on the number of participants. For example, in a lab experiment, variable costs might come from paying staff and participants. In general, analog experiments have low fixed costs and high variable costs, while digital experiments have high fixed costs and low variable costs (figure 4.19). Even though digital experiments have low variable costs, you can create a lot of exciting opportunities when you drive the variable cost all the way to zero.

There are two main elements of variable cost—payments to staff and payments to participants—and each of these can be driven to zero using different strategies. Payments to staff stem from the work that research assistants do recruiting participants, delivering treatments, and measuring outcomes. For example, the analog field experiment of Schultz and colleagues on electricity usage required research assistants to travel to each home to deliver the treatment and read the electric meter. All of this effort by research assistants meant that adding a new household to the study would have added to the cost. On the other hand, for the digital field experiment of Restivo and van de Rijt on the effect of awards on Wikipedia editors, researchers could add more participants at virtually no cost. A general strategy for reducing variable administrative costs is to replace human work (which is expensive) with computer work (which is cheap). Roughly, you can ask yourself: Can this experiment run while everyone on my research team is sleeping? If the answer is yes, you’ve done a great job of automation.

Figure 4.19: Schematic of cost structures in analog and digital experiments. In general, analog experiments have low fixed costs and high variable costs whereas digital experiments have high fixed costs and low variable costs. The different cost structures mean that digital experiments can run at a scale that is not possible with analog experiments.

The second main type of variable cost is payments to participants. Some researchers have used Amazon Mechanical Turk and other online labor markets to decrease the payments that are needed for participants. To drive variable costs all the way to zero, however, a different approach is needed. For a long time, researchers have designed experiments that are so boring they have to pay people to participate. But what if you could create an experiment that people want to be in? This may sound far-fetched, but I’ll give you an example below from my own work, and there are more examples in table 4.4. I think that participant enjoyment—what might also be called user experience—will be an increasingly important part of research design in the digital age.

Table 4.4: Examples of Experiments with Zero Variable Cost that Compensated Participants with a Valuable Service or an Enjoyable Experience.

If you want to create experiments with zero variable cost data, you’ll need to ensure that everything is fully automated and that participants don’t require any payment. In order to show how this is possible, I’ll describe my dissertation research on the success and failure of cultural products.

My dissertation was motivated by the puzzling nature of success for cultural products. Hit songs, best-selling books, and blockbuster movies are much, much more successful than average. Because of this, the markets for these products are often called “winner-take-all” markets. Yet, at the same time, which particular song, book, or movie will become successful is incredibly unpredictable. The screenwriter William Goldman elegantly summed up lots of academic research by saying that, when it comes to predicting success, “nobody knows anything.” The unpredictability of winner-take-all markets made me wonder how much of success is a result of quality and how much is just luck. Or, expressed slightly differently, if we could create parallel worlds and have them all evolve independently, would the same songs become popular in each world? And, if not, what might be a mechanism that causes these differences?

For a long time, researchers have designed experiments that are so boring they have to pay people to participate. But what if you could create an experiment that people want to be in?

In order to answer these questions, we—Peter Dodds, Duncan Watts (my dissertation advisor), and I—ran a series of online field experiments. In particular, we built a website called MusicLab where people could discover new music, and we used it for a series of experiments. We recruited participants by running banner ads on a teen-interest website (figure 4.20) and through mentions in the media. Participants arriving at our website provided informed consent, completed a short background questionnaire, and were randomly assigned to one of two experimental conditions—independent and social influence.

In the independent condition, participants made decisions about which songs to listen to, given only the names of the bands and the songs. While listening to a song, participants were asked to rate it after which they had the opportunity (but not the obligation) to download the song. In the social influence condition, participants had the same experience, except they could also see how many times each song had been downloaded by previous participants. Furthermore, participants in the social influence condition were randomly assigned to one of eight parallel worlds, each of which evolved independently (figure 4.21). Using this design, we ran two related experiments. In the first, we presented the songs to the participants in an unsorted grid, which provided them with a weak signal of popularity. In the second experiment, we presented the songs in a ranked list, which provided a much stronger signal of popularity (figure 4.22).

Figure 4.20: An example of banner ad that my colleagues and I used to recruit participants for the MusicLab experiments (Salganik, Dodds, and Watts 2006). Reproduced by permission from Salganik (2007), figure 2.12.

Figure 4.21: Experimental design for the MusicLab experiments (Salganik, Dodds, and Watts 2006). Participants were randomly assigned to one of two conditions: independent and social influence. Participants in the independent condition made their choices without any information about what other people had done. Participants in the social influence condition were randomly assigned to one of eight parallel worlds, where they could see the popularity—as measured by downloads of previous participants—of each song in their world, but they could not see any information about, nor did they even know about the existence of, any of the other worlds. Adapted from Salganik, Dodds, and Watts (2006), figure s1.

We found that the popularity of the songs differed across the worlds, suggesting that luck played an important role in success. For example, in one world the song “Lockdown” by 52Metro came in 1st out of 48 songs, while in another world it came in 40th. This was exactly the same song competing against all the same other songs, but in one world it got lucky and in the others it did not. Further, by comparing results across the two experiments, we found that social influence increases the winner-take-all nature of these markets, which perhaps suggests the importance of skill. But, looking across the worlds (which can’t be done outside of this kind of parallel worlds experiment), we found that social influence actually increased the importance of luck. Further, surprisingly, it was the songs of highest appeal where luck mattered most (figure 4.23).

MusicLab was able to run at essentially zero variable cost because of the way that it was designed. First, everything was fully automated so it was able to run while I was sleeping. Second, the compensation was free music, so there was no variable participant compensation cost. The use of music as compensation also illustrates how there is sometimes a trade-off between fixed and variable costs. Using music increased the fixed costs because I had to spend time securing permission from the bands and preparing reports for them about participants’ reaction to their music. But in this case, increasing fixed costs in order to decrease variables costs was the right thing to do; that’s what enabled us to run an experiment that was about 100 times larger than a standard lab experiment.

Figure 4.22: Screenshots from the social influence conditions in the MusicLab experiments (Salganik, Dodds, and Watts 2006). In the social influence condition in experiment 1, the songs, along with the number of previous downloads, were presented to the participants arranged in a 16 × 3 rectangular grid, where the positions of the songs were randomly assigned for each participant. In experiment 2, participants in the social influence condition were shown the songs, with download counts, presented in one column in descending order of current popularity.

Figure 4.23: Results from the MusicLab experiments showing the relationship between appeal and success (Salganik, Dodds, and Watts 2006). The x-axis is the market share of the song in the independent world, which serves as a measure of the appeal of the song, and the y-axis is the market share of the same song in the eight social influence worlds, which serves as a measure of the success of the songs. We found that increasing the social influence that participants experienced—specifically, the change in layout from experiment 1 to experiment 2 (figure 4.22)—caused success to become more unpredictable, especially for the songs with the highest appeal. Adapted from Salganik, Dodds, and Watts (2006), figure 3.

Further, the MusicLab experiments show that zero variable cost does not have to be an end in itself; rather, it can be a means to running a new kind of experiment. Notice that we did not use all of our participants to run a standard social influence lab experiment 100 times. Instead, we did something different, which you could think of as switching from a psychological experiment to a sociological one. Rather than focusing on individual decision-making, we focused our experiment on popularity, a collective outcome. This switch to a collective outcome meant that we required about 700 participants to produce a single data point (there were 700 people in each of the parallel worlds). That scale was only possible because of the cost structure of the experiment. In general, if researchers want to study how collective outcomes arise from individual decisions, group experiments such as MusicLab are very exciting. In the past, they have been logistically difficult, but those difficulties are fading because of the possibility of zero variable cost data.

In addition to illustrating the benefits of zero variable cost data, the MusicLab experiments also show a challenge with this approach: high fixed costs. In my case, I was extremely lucky to be able to work with a talented web developer named Peter Hausel for about six months to construct the experiment. This was only possible because my advisor, Duncan Watts, had received a number of grants to support this kind of research. Technology has improved since we built MusicLab in 2004 so it would be much easier to build an experiment like this now. But, high fixed cost strategies are really only possible for researchers who can somehow cover those costs.

In conclusion, digital experiments can have dramatically different cost structures than analog experiments. If you want to run really large experiments, you should try to decrease your variable cost as much as possible and ideally all the way to zero. You can do this by automating the mechanics of your experiment (e.g., replacing human time with computer time) and designing experiments that people want to be in. Researchers who can design experiments with these features will be able to run new kinds of experiments that were not possible in the past.

Excerpted from BIT BY BIT: Social Research in the Digital Age by Matthew J. Salganik. Copyright © 2018 by Matthew J. Salganik. Published by Princeton University Press. Reprinted by permission.