The pitfalls of crowdsourcing: online ratings vulnerable to bias

“How would you use Yelp to find the number of businesses in the U.S.?’’Factual, Software Engineer interview
–Scott Eells/Bloomberg

From where to eat dinner tonight to which doctor to visit, there’s hardly a decision we make without tapping into the wisdom of the crowd. Online reviews are a compass for navigating choice in the modern world, with “likes,’’ “+1’s,’’ and star ratings pointing us to the products, experiences, and information that suit us best.

But is the crowd always wise? Apparently not, according to a study published Thursday.

Co-authored by a researcher at the Massachusetts Institute of Technology, the research provides a cautionary tale for the social-networked age, showing just how easily the crowd can be nudged into overinflated enthusiasm and approval, with minimal manipulation. The crowd was not similarly susceptible to negative influences, however, suggesting that review sites may be biased toward overly favorable ratings.


“More and more, we have a cacophony of information about what people like,’’ said co-author Sinan Aral, associate professor of information technology and marketing at MIT’s Sloan School of Management. “We wanted to understand how this type of information can bias people’s opinions, or not.’’

The findings published in the journal Science surprised even Aral, who studies the ways in which social influences can affect decision-making and purchases. On an unidentified online news website, where comments are voted up or down based on how good readers think they are, Aral and collaborators randomly assigned 101,281 comments to be seeded with one up vote or one down vote, or left alone when they were posted. They found that this minor intervention could lead to a snowball effect; a single positive vote increased the likelihood of positive ratings by a third, and resulted in final ratings that were 25 percent higher than average. Comments seeded with an initial positive vote were 30 percent more likely to receive a very high rating. Negative votes, on the other hand, were quickly corrected by the wisdom of the crowd.

“The case where the wisdom-of-the-crowd effects work well is where each person brings their own observation and knowledge, however imperfect and idiosyncratic. It works best when it’s independent,’’ said Christopher Chabris, associate psychology professor at Union College who was not involved in the research. “What this shows is when you don’t have that independence and everyone sees the history of other people’s opinions, you can get big biases in the outcome.’’


The power of social influence to skew people’s choices has been clear in offline environments. In one famous experiment conducted in the 1950s, people were asked to choose which of three lines was the same length as another. Participants were steered to conform to an obviously wrong choice — if other people in the room publicly chose the wrong line.

The online world takes people’s opinions, aggregates them, and makes them part of our baseline knowledge. Such information appears empowering; after all, the aggregated feelings of dozens, hundreds, or thousands of people aren’t subject to the same limitations of an expert review, which reflects a single person’s experience and may be influenced by conflicts of interest and other unknown sources of bias. But the new study is part of a growing body of research showing online ratings can also be skewed in ways the public has thought little about, and could be vulnerable to manipulation by companies or others.

In a 2006 study in the journal Science, a team studied an online music website in which 14,341 participants were randomly divided into two groups — one in which the users saw how songs had been rated and downloaded, and another in which they did not. What the researchers found was that the ability to see what peers thought made predicting which songs would do best hard — “the ‘best’ songs never do very badly, and the ‘worst’ songs never do extremely well, but almost any other result is possible,’’ the authors wrote.


In a paper posted online late last year, Michael Luca, an economist and assistant professor at Harvard Business School, analyzed Yelp restaurant reviews and how they evolve over time.

Luca found that simply averaging all the reviews a restaurant receives might not be the most accurate way to rate the experience of going to the restaurant. That’s partly because prior reviews appear to influence the reviews those that come after them — the same sort of peer influence that was found in the new study. In Luca’s study, reviewer biases could lead to ratings that were off by as much as a quarter of a star, in either direction.

He developed a way to weight the ratings so they better reflected the diners’ experiences.

“Every type of information source comes with a whole host of problems. For reviews, it’s not that this is a deal-breaker that the problem exists,’’ Luca said. “It’s just that this is the first time people are starting to think about the problem because the system is new.’’

It’s difficult to say how strongly the new findings would apply to other areas. Even within the news site the researchers used, certain topics were more subject to social influence than others: comments on stories about politics, culture, and business were subject to overinflated positive ratings, whereas comments on stories about news, economics, information technology, and fun were not vulnerable to the enthusiasm of the herd.

Aral said that he would expect that social influence would have an effect in lots of sectors of society, but that the degree to which it distorted ratings would vary. Now, he hopes to understand the decision-making mechanisms behind the result — one that he, himself, has experienced.

Earlier this summer, Aral was eating at a restaurant in New York and went to leave a review on Yelp. He had intended to leave a three-star rating, but saw that the last review had rated it much higher and praised the prices and salad dressing. He left four stars instead.

Jump To Comments