Too Much of a Good Thing—Overly Positive Online Ratings—Makes for Difficult Decisions

My girlfriend, Klára, considers herself something of a movie buff. For the most part this is a good thing. I have a trusted and reliable source for what to watch when it’s time to kick back.

But there is a downside. 

Whenever I suggest something to watch, she’s pretty skeptical. Her first reaction is to check the ratings online and report back that, just as she suspected, the movie can’t be any good because it’s only been rated seven-and-a-half stars. She’ll trust me with big financial decisions, for instance, but a 90-minute movie seems a step too far. For that, she’d rather believe a group of strangers on the internet.

But how reliable are those movie ratings? A series of studies (open-access), recently published in Nature Human Behaviour, suggests those ratings might not be the best way to pick your next movie. A research team, led by Matthew Rocklage, assistant professor of marketing at the University of Massachusetts–Boston, explored how online ratings (e.g., four out of five stars) of movies, restaurants, books, and commercials related to that thing’s success. The team also explored how the language people use in reviews, particularly the level of emotion, related to success. What the researchers found was that emotionality of reviews did a better job than star rankings at predicting which reviewee performed better—more box office sales for movies, more reservations for restaurants.

This is in part because ratings tend to skew heavily positive, so much so that the ratings cease to be useful in differentiating between one movie or another, something Rocklage and his collaborators have dubbed the positivity problem. (It’s curious to think that positivity could be a problem online, when we so often focus on the issues caused by negativity in the digital sphere.)

I spoke with Rocklage over Zoom about the positivity problem, what his team learned from this series of studies, and how understanding emotion on a mass scale might help us predict mass-scale behavior. I also now have a fighting chance the next time Klára and I are picking a movie.

Our conversation has been edited for length and clarity.

Evan Nesterak: I want to begin by asking if you could describe what the positivity problem is.

Matthew Rocklage: We often think about online rhetoric being very negative. But when we started to look at online reviews, those are overwhelmingly positive. When you aggregate them for a product or a movie, 80 to 100 percent are four- or five-star ratings, almost as positive as you can possibly get. On Yelp, 50 percent of all reviews give the maximally positive rating, despite the fact that I think some of us might be a little suspicious—Did that person actually have the most positive experience they possibly could have had at that restaurant?

We were a little surprised by how positive things were, and we called this the positivity problem—How do you discern what is truly good and will become successful in the future from what’s not so good?

For a product or a movie, 80 to 100 percent of ratings are four- or five-star, almost as positive as you can possibly get. How do you discern what is truly good and will become successful in the future from what’s not so good?

How did your research into online ratings and reviews originate?

It stems from something even more basic. I created a computational linguistics tool, called the Evaluative Lexicon, and we’ve been using that to understand how we can use people’s language to predict their future behavior or predict their future opinions. The facet that we really focus on is the emotion of people’s words. People can be very positive in their language—something is excellent or superb or helpful or useful. You can also use language that’s much more based on people’s feelings. Instead of excellent you might say amazing or exciting. Instead of useful, you might say pleasing or enjoyable. We can take this emotion from people’s language and we’ve used it to predict different outcomes. For instance, we find that the more feeling that someone has in their language, the longer their opinion lasts over time.

We wondered if we could predict success in an online review setting, because it seems like it would be in line with our previous research. Once we started to enter this domain, we realized maybe we could be even more useful than we thought because there’s this positivity problem. We show, for instance, that star ratings are really not a reliable predictor of the success of an item—movie, restaurant, book, or even commercial—whereas the emotion of people’s language, as measured by the Evaluative Lexicon, does predict these things more reliably.

Star ratings are really not a reliable predictor of the success of an item—movie, restaurant, book, or even commercial—whereas the emotion of people’s language, as measured by the Evaluative Lexicon, does predict these things more reliably.

Before COVID-19 at least, trying to find a restaurant everyone in a group could agree on was a challenge. There always seems to be that one friend in the group that says, “It doesn’t have five stars, we can’t go there.” Then I’m forced to wait 30 minutes while they search for the perfect restaurant, rather than eating at the one nearby. So take us through your research on restaurants and how ratings and reviews relate to the quality of the restaurant.

With the restaurant data, we took the first 30 reviews that a restaurant ever received. We wanted to know whether we can predict how many table reservations, as one measure of success, that restaurant would get at any point later in the future. We followed these 1,000 restaurants in Chicago for two months, and we recorded how many table reservations these restaurants got on OpenTable.com. Then we wanted to use those first 30 reviews on Yelp to predict how well that restaurant was doing at some point in time.

Unfortunately, your friend could still be a little correct in this case. Star ratings did predict, by themselves, that the restaurant would get more table reservations in the future. However, emotion also did. When we added emotion and star ratings in the same model, predicting at the same time, emotion was the stronger predictor. So your friend is not necessarily wrong, but you could come back to them and say, “That’s not all there is, and, in fact, when you think about the experience that people have, maybe emotion could be the better predictor of success.”

You also looked at movies. That’s another debate I often find myself in. Explain what you did to understand the success of movies in relation to their ratings and reviews.

Here’s where we really found evidence of the positivity problem. If you just focus on positive movies, those that are given a star rating above the midpoint of the star rating scale—in the terms of Metacritic the midpoint would be a five—we find that the higher the rating goes, the worse that movie did in terms of the revenue it made at the box office in United States. So we saw real evidence of this problem following star ratings.

Star ratings are, in fact, a quite poor predictor of success. But does the emotion of those very same reviews—again we’re looking at the first 30 reviews—reveal differences? When we look at those same 30 reviews, greater emotion predicts that the movie would earn more revenue at the box office later.

When you, personally, are buying a product or going to a restaurant or movie, do you have a strategy for how you make a choice?

My research has definitely changed the respect that I have for different parts of choosing restaurants and movies and that sort of thing. I honestly do put less weight on star ratings than I used to, and I’ve found that that’s helped me. I say, “Okay, a star rating is one piece of information. What else is out there?”

One thing I look at is the feeling that people have. And I’ve actually given more respect to the number of people that have reviewed something. Let’s just say you’re trying to pick a restaurant and you find a three-and-a-half-star restaurant. That’s not great by star ratings standards, but there are 1,000 people that have reviewed that restaurant, whereas every other restaurant in that area only received 300 ratings. I give that more respect now than I used to, because it suggests that even though something’s not perfect about that restaurant, there’s something engaging and interesting about it that keeps people coming. Then I read the reviews and start to say, “I understand why they’re not giving this restaurant perfect rating, but they seem to really enjoy it, they’re using a lot of feeling-based words. Maybe it’d be worth a shot, even though I might have discounted this restaurant in the past.”

If a restaurant makes you feel something, that tells us something about emotion, and we can predict future behavior based on that emotional reaction.

What are some of the complexities or nuances that you’d want people to understand about this research?

I think you could easily take this research from a practical standpoint, and a lot of what we’ve talked about is very pragmatic. But we could also emphasize the more basic science of this, and that was really our interest from the beginning. Previous research that I’ve done shows that emotion behind people’s opinions predicts the strength of those opinions. This is very much another building block in understanding the consequences of emotion.

The basic idea is that mass-scale emotion is predictive of future mass-scale behavior. And we can, yes, use online reviews to understand that there’s a practical point to that, but really it’s something more basic—if a movie evokes emotion, or if a book evokes a feeling, or if a restaurant makes you feel something, that tells us something about emotion, and we can predict future behavior based on that emotional reaction. This research makes those more theoretical or conceptual points as well.