In graduate school, I had an engineering friend named Steve who proposed that he and I, a psychologist, team up to create a machine to predict the future. We didn’t, of course, but only because such a machine apparently already exists, at least in its beta version. You and I use it on nearly a daily basis. Without it, millions of Americans would not know how to whip up meatloaf (the #2 most searched for recipe, second only to chicken), what the Ice Bucket Challenge was (the ALS campaign that inspired 2.4 million videos), or how to get to the new restaurant on the other side of town (based on what we know to be the most popular app on the planet). It’s Google.
Perhaps, then, it is not surprising that the search engine we rely on most to figure out the little and big problems of our day has also managed to find out an incredible amount of information about us—so much so that it can even predict what we, collectively, will do next. In fact, we now know that Google search data can predict (however imperfectly): flu epidemics, unemployment rates, vacation destinations, and retail sales, to name a few. Google can tell you other things too, like how many votes Obama’s race cost him in the elections or what pregnant women really want.
In our paper, we directly pit Google search data against a more conventional measure—in this case, self-reported survey results—to predict suicide. As a mental health issue and public health problem, suicide has long stood as a challenging phenomenon to study. Not only is suicide multidimensional, but it also suffers from systemic reporting biases because of the stigmas associated with taking one’s own life. Given the difficulty of relying on conventional sources of suicidality data to tell us who is at risk, we turned to Google for help.
We found that relative to self-reported measures of suicide risk, Google search data was better at estimating the number of completed suicide deaths over a two-year period. We specifically examined the search terms: suicide, how to (commit) suicide, how to kill yourself, and painless suicide among the 50 states. For the self-reported suicide measure, we used data from the National Survey on Drug Use and Health, which contained answers from U.S.-based, adult participants on their suicidal thoughts and behaviors. While the Google search terms that reflected suicidal intent correlated significantly with CDC’s data on actual suicide deaths at the state level, self-reported suicidality did not.
The next step was to see if Google and surveys misestimated suicide for everyone equally, or whether there were certain groups more likely to be misestimated than others. We used U.S. Census Data related to SES (per-capita income, home ownership, unemployment, poverty, educational attainment), race (namely, percentage of minorities), age and quality of life (federal aid to local governments, violent crime rate) to predict the discrepancy between Google’s estimates versus actual suicides, and self-reports’ estimates versus actual suicides. We found that states with lower income, more minorities, and higher crime were more likely to have their suicide rates mis-estimated by Google and self-report.
On the one hand, these findings highlight the potential for Google search data to emerge as a powerful public health tool for suicide surveillance. Unlike surveys, which can be costly, time consuming, and subject to limited samples, Google data is highly accessible, efficient, and free. Thus, Google can serve as an “early detection device” in signaling regional vulnerabilities in suicide susceptibility and tailoring online interventions (e.g., chat hotlines) to better respond to people who are actively googling suicide terms.
On the other hand, these results also suggest that both Google and self-reported measures of suicide are subject to the same limitations. Suicide researchers have long observed that conventional measures may be inadequate in targeting racial minorities and the socioeconomically underprivileged. Google search data, in this regard, does not solve the problem, because use of Google is also most prevalent among the socioeconomically privileged and well-educated.
Where does this leave us? Perhaps the beauty of Big Data sources like Google is that its breadth and capacity allows us to also see its limitations and pitfalls. When Google Flu first emerged and appeared to promise the ability to predict future epidemics, it seemed as if Big Data was about to trump all other forms of data collection that came before. In recent years, research has shown that Google Flu is a less than perfect crystal ball.
Taken together, these results suggest that we still need traditional measures and methods. Surveys, laboratory experiments, and clinical studies cannot be replaced by search engines. Moreover, perhaps more than predicting the future per say, what Google data may be best as is estimating the present—a phenomenon researchers have began to call nowcasting. In other words, rather than estimating distant futures, Google may be best for estimating near-term phenomenon as they are happening, or about to. Ultimately, Big Data may be best consumed along with—and not in lieu of—small data.