Weapons of Mass Persuasion: Tracing the Story of Psychological Targeting on Social Media - By Sander van der Linden

It was sometime in August 2016 that I received an enthusiastic email from Alex Kogan—better known as Aleksandr Spectre at the time. He had received word that I would be arriving in Cambridge soon to take up a new faculty position, so he graciously offered to take my wife and me around town for a tour. It would be his last day in town—he was getting ready to move back to the United States. At the time, I had no idea who Alex was (aside from a new colleague), and remember thinking to myself how nice he was, taking time out of his busy schedule to welcome us in such a warm manner.

After the tour, we went our separate ways but made plans to meet up for dinner and drinks in a local pub later that evening. I was intrigued.

After a few pints at the pub, I wanted to know more about his story. He told me that he had just completed a temporary lectureship at the university and was now focusing on “big data,” in the hope that he could use machine learning algorithms to predict people’s personality based on the digital traces they leave behind on social media.

I was fascinated. I told him that I had been studying people’s beliefs about climate change for some time and that we’d been building statistical models to understand how national public opinion data could be broken down reliably at smaller geographical subunits, such as the state or county level.

After I told Alex about our climate models, he immediately switched the conversation to his new career plans. His plan, he explained, was to harvest big data and go into consultancy—in fact, he had already set up shop back home in the U.S. He wondered why I bothered with surveying people; he could get me tons of predicted climate opinion data from Facebook, based on what people put in their profiles and the pages they “liked.” Millions of datapoints. But it wasn’t going to be free.

This suggestion triggered some alarm bells for me. Why would I buy Facebook data? Who would use that, I wondered, and how did he obtain so much of it—and for what purpose? I kindly thanked him for the offer but declined. My wife and I both left the pub with a rather strange feeling about that conversation.

I never saw Alex Kogan again after our meeting, but it wouldn’t be the last time I would hear of him. A few months later he made worldwide headlines for allegedly selling Facebook data on 87 million individuals to Cambridge Analytica—a British political consulting firm. He had collected this data through a Facebook app he developed called This Is Your Digital Life.

This triggered some alarm bells for me. Why would I buy Facebook data? Who would use that, I wondered, and how did he obtain so much of it—and for what purpose?

It was this data that Cambridge Analytica used to microtarget voters online with political advertisements, providing data science services to the campaigns of Donald Trump, the Texan senator Ted Cruz, and possibly the Brexit Leave campaign. Some commentators have suggested that Cambridge Analytica influenced elections and undermined the functioning of democracy. From CBS’s 60 Minutes to British parliamentary hearings, the scandal turned Kogan’s and everyone else’s lives upside down.

Of course, the million-dollar question is whether any of this stuff actually works. Perhaps all the uproar over Cambridge Analytica and its putative influence on the 2016 U.S. election is widely overblown? The whistleblower, Christopher Wylie, has certainly popularized a scary version of the science. Big claims have been made in the media about how so-called psychographic data on people’s personalities allows politicians to microtarget their messages to influence your vote. News headlines have blared, “Your Data and How It Is Used to Gain Your Vote,” and, “Global Manipulation Is Out of Control.”

Yet a rigorous assessment of how these models work and direct evidence on whether microtargeting can influence our voting behavior have remained elusive. Until recently.

The idea to predict personality scores based on your social media data actually came from two other researchers, David Stillwell, professor of computational social science at the University of Cambridge, and Michal Kosinski, now associate professor at Stanford University.

Their idea was radical. Instead of asking people to consent to participating in a survey or opinion poll, they figured they could simply scrape digital footprints online and use that behavioral data to predict people’s personality instead.

Stillwell ran a Facebook application called myPersonality. The app allowed users to take psychometric tests and returned a personality profile in exchange for participants consenting to share data from their Facebook profile purely for scientific purposes. At the time, it was one of the largest data sets in the history of social science with about six million people.

Kogan had put Christopher Wylie (who worked at Cambridge Analytica) in touch with Stillwell and Kosinski. But once they figured out that Cambridge Analytica was not interested in funding a grant for academic research, they backed out of the project. Kogan decided he would pursue it by himself in a private capacity. He figured he could create an app similar to the one Stillwell and Kosinski had been using on his own—which he did—and that’s how This Is Your Digital Life was born.

To better understand how accurate Kogan’s model and data were in predicting people’s characteristics from their social media data, we can turn to the pioneering work that Stillwell and Kosinski published back in 2013. By looking at what Stillwell and Kosinski found in their studies, we can get a pretty good feel for what happened with Cambridge Analytica.

Their idea was radical. Instead of asking people to consent to participating in a survey or opinion poll, they figured they could simply scrape digital footprints online and use that behavioral data to predict people’s personality instead.

In 2013, Kosinski and Stillwell published a paper in the Proceedings of the National Academy of Sciences that leveraged data from the myPersonality Facebook app. Back then, Kosinski and Stillwell were the first to match people’s survey responses—including the Big Five personality test—to their Facebook data. They were able to match data on about 58,000 consenting volunteers from the United States. The digital records of online behavior that they were after were people’s Facebook profiles and their “likes.”

On average, Kosinski and Stillwell obtained about 170 likes per individual. With this information in hand, they could now compare how well a model trained purely on Facebook likes could predict people’s personal and psychological characteristics. They did this by comparing the model’s predicted answers to the actual answers, which were obtained from information that people self-declared on their Facebook profiles (such as their gender or relationship status) as well as the answers they gave during the psychometric tests that were completed as part of the survey.

The finding that shocked the research community (and the world) was that by using just Facebook likes, Kosinski and Stillwell were able to predict your gender with 93 percent accuracy, your politics with 85 percent accuracy, your ethnicity with 95 percent accuracy, and even your sexual orientation with 88 percent accuracy. Of course, some of these findings are fairly intuitive. For example, the fact that you’re likely to be a Democrat if you liked “Barack Obama” is not overly surprising. But Kosinski and Stillwell argue that this was not always the case. For example, very few users who identified as being gay and male liked sites with titles such as “I love being gay.”

How well did Kosinski and Stillwell’s model perform for the Big Five “OCEAN” model (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism)?

In fact, their model performed worse on more complex psychological traits such as personality. The correlation between the model’s predicted and people’s actual answers was 0.3 for Agreeableness, 0.4 for Extraversion, 0.29 for Conscientiousness, and 0.43 for Openness. This finding is actually consistent with what Kogan told me about his own models that night in the pub. (After I declined his offer to sell me data, he told me that his models were not very accurate anyway. The timing here is important: he told me this in private, long before all of this became public knowledge.)

A correlation in the range of 0.30 to 0.40 is considered relatively small in our field. Kosinski and Stillwell don’t report this in the paper themselves, but a back-of-the-envelope calculation suggests that these kinds of correlations would translate into an accuracy rate of about 67–73 percent depending on the personality trait in question. In predictive modelling, 70 percent accuracy is not particularly good but generally considered “acceptable.” David Sumpter, professor of applied mathematics at the University of Uppsala in Sweden, did his own replication of Kosinski and Stillwell’s data and came to a similar conclusion. One key takeaway from these results is therefore that predicting people’s personality based on their Facebook likes is not as easy as predicting other personal traits and requires many more likes—in fact, hundreds of them.

A rigorous assessment of how these models work and direct evidence on whether microtargeting can influence our voting behavior have remained elusive. Until recently.

But Kosinksi and Stillwell say we should put these numbers in context. They compared how well people’s friends, family, colleagues, and even their spouse can predict an individual’s personality compared to the predictions of their model (which was once again based on Facebook likes). What they found was astonishing: the computer algorithm could outperform the accuracy of a work colleague’s prediction of someone’s personality with just ten likes. It needed just 70 likes to outperform the prediction of friends, 150 likes to outperform family, and about 300 likes to be able to outperform a spouse. The idea that a computer algorithm trained on your Facebook likes can estimate your personality with the same level of accuracy as your spouse is pretty scary.

However, the fact that Facebook likes just so happen to correlate, to a certain degree, with people’s personalities does not imply that targeting people with an ad based on their personalities actually causes them to do anything. Or does it?

Back in 2012, Facebook had already filed a patent called “Determining user personality characteristics from social networking system communications.” Part of the patent reads: “The inferred personality characteristics are stored in connection with the user’s profile, and may be used for targeting.” So clearly Facebook had been developing a similar technology of their own that could be used for targeting people with ads based on their personality.

Kosinski and Stillwell returned for the sequel. They wanted to evaluate whether such predictions are actually accurate enough to allow for real-world microtargeting.

Because you cannot target people directly based on their personality on Facebook, Sandra Matz (now associate professor of business at Columbia University) together with Kosinski and Stillwell came up with a very clever workaround for their follow-up study. It turns out that you can target people on Facebook based on their likes. So Matz, Kosinski, and Stillwell were able to draw on their previous research to figure out what likes were most strongly associated with different personality traits. They decided to focus on Extraversion and Openness for the experiment because their past models achieved the highest accuracy in predicting these traits from Facebook likes. In particular, the researchers looked at likes associated with the highest and lowest levels of Openness and Extraversion. (For example, target likes for introversion included “computer” and “Battlestar Galactica,” while target likes for high levels of Extraversion included “Parties” and “Making People Laugh.”)

In the first experiment, Matz, Kosinski, and Stillwell selected women as their target group for a U.K.-based beauty retailer. They designed several ads. Some were meant to speak to individuals high on the scale for Extraversion—for example, one featured a young energetic female dancing and laughing with the text, “Dance like nobody is watching (but they totally are).” Conversely, the Introversion (or low Extraversion) ad portrayed a young shy female doing her makeup alone at home with the text, “Beauty doesn’t have to shout.” In a nutshell, they designed two versions of the same ad, one to appeal to people high and the other to people low in Extraversion.

The team subsequently went to the Facebook advertising platform and under “Interests” entered the relevant page likes (such as “Making People Laugh” for Extraversion, and “Serenity,” “Computer,” etc., for Introversion). Facebook then returned the number of people who liked relevant pages. For the first study, the team also selected some other targeting criteria, such as females aged 18–40 living in the U.K. The ad campaign was real and featured on people’s Facebook pages for about a week.

The fact that Facebook likes just so happen to correlate, to a certain degree, with people’s personalities does not imply that targeting people with an ad based on their personalities actually causes them to do anything. Or does it?

After a week the ad campaign reached over three million people who clicked on the ads over 10,000 times with nearly 400 actual purchases from the beauty retailer’s website. But did the ads perform better when they were targeted at a user’s personality traits? They clearly did: people were much more likely to purchase a product when they viewed an ad that was consistent with their predicted personality trait. For example, Facebook users higher in Extraversion were 50 percent more likely to make a purchase if the ad was correctly targeted at extraverts versus not (117 versus 62 purchases).

This is pretty convincing evidence given that for these real-world experiments the researchers only used a single like (the most extreme one) to target people on Facebook. In a simulation buried in the technical appendix, the researchers make an important observation. Although the average accuracy of their prediction model varies, it depends on the discriminatory power of the “like” in question. By that, we mean that some likes are more predictive of, say, “Openness” than others. If you can find the likes that are most predictive of a given personality trait (and thus have high discriminatory power) you can boost the model’s accuracy substantially. (For example, the average accuracy across all traits for likes with a low discriminatory power is just 58 percent—barely above chance—but it can improve to close to 70 percent for likes with high discriminatory power.) The team estimated that for specific traits such as Openness, the model’s accuracy could be as high as 82 percent, whereas Agreeableness seems to max out at about 61 percent. This leads to a few important insights.

The first is that accuracy clearly differs for different personality traits and Matz, Kosinski, and Stillwell’s findings may be optimistic in the sense that they targeted traits in the experiment that can be predicted with the highest degree of accuracy. But it also suggests that a trade-off must be made between accuracy and reach. If you want to be more accurate, you need a larger number of “likes.” But when you select many likes on the Facebook advertising platform, you decrease the potential pool of people you can reach. (For example, although a few million people might like “Party” pages, fewer people are going to like “Party” and “I like to meet new people”; and even fewer people will have liked “Party” and “I like to meet new people” and “Lady Gaga” and so on.) Targeting people with the highest level of precision means that you cannot reach hundreds of millions of people all at once. So what level of accuracy will still give a microtargeter sufficient reach? Well, Facebook is a huge platform, so the researchers estimate that even with eight likes, you can still reach about 6 million people.

You might reason that downloading an app or buying a beauty product is not the same as influencing who somebody is going to vote for. Although it is true that converting votes is a lot harder than converting clicks, it is certainly not impossible.

My good Cambridge colleague Lee de Wit, associate professor of psychology and author of What’s Your Bias? conducted a study in the context of Brexit where he first gave people a personality quiz to determine their level of, say, Conscientiousness or Openness. He specifically targeted about 400 Remain supporters and exposed them to arguments in favor of leaving the European Union, framed to appeal to either those high or those low on a particular personality trait. (An example for those scoring high in Conscientiousness would be a media article talking about how immigration is causing “disorder” and the need to “systematically regulate the influx of people.”) Results showed that convincing Remainers of Leave arguments worked better if the ad was congruent with their personality profile: they thought the arguments were more credible and they were more likely to vote for a party making those kinds of claims. But the study was not conducted on a social media platform.

In 2020, Dutch researchers tried to address this problem. They created a fake social media platform with the feel and look of Facebook. The cover story was that participants would log on to the platform to help test a new social network for the university. In the first phase of the experiment, they were asked to fill out some profile information and write a bit about themselves. Out of the 230 initial participants, the researchers were able to use a machine learning algorithm (based on what people put on their profile) to reliably identify about 75 introverts and 81 extraverts.

In the next part of the experiment, these individuals were targeted with an ad in their social media feed. The message was a progressive left-wing ad for the Dutch green party. The only change was the text around the ad, which was manipulated to be either extraverted (“bring out the hero in you”) or introverted. They then asked whether people would vote for the party on a scale ranging from very unlikely (1) to very likely (7). Here the results were unambiguous: voting intentions were substantially (about 35 percent) higher if the extraverted ad was targeted at extraverts as opposed to introverts and vice versa.

What is so concerning about this result is that, as opposed to regular political ads, people cannot reasonably defend themselves against persuasion attacks when they don’t even know that they are being targeted.

To me, this is clear causal evidence that political microtargeting works, at least in a semirealistic social media setting. What is so concerning about this result is that, as opposed to regular political ads, people cannot reasonably defend themselves against persuasion attacks when they don’t even know that they are being targeted.

In summary, although traditional campaigns may have struggled to persuade voters, social media can now help optimize the identification and microtargeting of those individuals most open to persuasion via their digital footprints. A 2021 report from the University of Oxford shows that private firms—much like Cambridge Analytica—are now offering digital propaganda services on behalf of political entities in at least 48 countries around the world.

What we need now is a vaccine—a process that can produce psychological immunization to protect people from malicious and harmful online manipulation.

Reprinted from Foolproof: Why Misinformation Infects Our Minds and How to Build Immunity by Sander van der Linden. Copyright © 2023 by Sander van der Linden. First American Edition 2023. First published in Great Britain by 4th Estate under the title Foolproof: Why we fall for misinformation and how to build immunity. Used with permission of the publisher, W. W. Norton & Company, Inc. All rights reserved.

Sander van der Linden

Further Reading & Resources

Recommended for You

Americans Are Overworked. Could AI Change That?

AI, Productivity, and Human Finitude: A Conversation With Oliver Burkeman

What Happens When AI-Generated Lies Are More Compelling than the Truth?