Netflix two-thumbs-up feature explained – Protocol

For nearly five years, Netflix has had simple thumb-up and thumb-down icons to express viewing preferences and help its algorithms provide better recommendations. But in surveys, people often expressed that this binary type of poll did not really do their tastes justice.

What if they were really, really in love with a show?

With the task of coming up with a better way to express such levels of worship, the streaming service recently explored the idea of ​​adding a heart icon to the Netflix app. The heart seemed like an obvious choice. It is a universal sign of love and is widely used in apps like Instagram and Twitter.

But Netflix would not be Netflix if the company did not put features like these through some rigorous testing; in this case, it took almost a year. During that time, the company discovered that hearts were actually not the best feature after all, and instead opted for a new two-thumbs-up option, which will be made available to its subscribers worldwide this week.

Here’s how that heartbreak occurred.

Finding a universal symbol of love

Netflix rolled out its new two-thumbs-up feature across its mobile and smart TV apps as well as its website on Monday. Subscribers are informed that this type of feedback directly affects future recommendations. A thumbs down means that a title is not proposed again; a thumbs up will result in Netflix recommending similar content. Two thumbs up means “we know you’re a true fan,” as the Netflix mobile app puts it.

The company started its work with the feature about a year and a half ago based on feedback it received in surveys and research interviews from its subscribers. “We heard from members that ‘like’ and ‘dislike’ were not enough,” said Christine Doig-Cardet, who heads the company’s personalized UI product innovation team. “There were some shows that they really, really, really enjoyed. It was important to distinguish between what they love and what they like.”

Once the decision was made to solve this problem, Netflix started a series of design sprints to bring images to this level of fandom. Some of the early ideas included the heart, an applause icon, shooting stars and others. Designers also consulted with the company’s globalization team to find an icon that was truly universal. “The design team and the globalization team really [homed] into the symbols that connote love, ”said Netflix Director of Product Design Ratna Desai. “We wanted it to be very precise, very concise, because we wanted this to be a very fast interaction.”

Image: Netflix

Netflix tested a variety of reactions that could reflect a viewer’s interest in a show.

At the same time, Netflix continued to query its subscribers, who had another suggestion. “We had a lot of interviews and surveys, [and] the heart did not really resonate, ”said Doig-Cardet. “The idea that came from the members was: Why don’t you just try two thumbs up?”

At that point, two frontrunners showed up. The heart seemed like an obvious choice, but two thumbs up also seemed to work well with Netflix’s existing iconography. Plus, as anyone who has ever read a review by the late Roger Ebert knows, it has long meant a vote of confidence in great entertainment.

Going by what its subscribers wanted seemed like a good idea, giving credibility to the two thumbs up. But what if these subscribers were wrong?

“Some people can speak loudly,” the Doig-Cardet said. “But when you look at the whole picture, talk to a lot of different members and see how they engage in the different functions, it actually does not always [match] the first loud voices. “

Proves the loudest voices wrong

Netflix has long been trying to figure out how to best collect member-based content ratings, and it has been a challenge to deal with the loud voices. In its early days, Netflix used to offer a five-star rating system, similar to the way people rate their Uber drivers.

At the time, Netflix displayed an average of these ratings on its website to convey how well-liked a title was among subscribers. This resulted in some titles having 4.5 stars or other factions, which made people wonder why they could not also rate in half-star intervals.

Thousands of people told the company in surveys that they wanted this degree of granularity, but Netflix employees were not sure if these opinions reflected how people actually used the service. To make sure it did not fall for the opinions of a vocal minority, Netflix resorted to something that has become an important part of their product development toolbox over the years: an A / B test.

In the case of the half-star test, the results were obvious: the ratings dropped significantly when people were asked to provide feedback with that granularity level. In other words: A / B tests showed that the loudest voices were wrong.

Netflix repeated this kind of testing as it completely replaced the five-star ratings with thumbs up in 2017. IA / B tests prior to that change saw the company rating activity increase by 200% with thumbs up and thumbs down icons. Part of the problem was that these icons were just simpler, but a closer look at the data also revealed that they tended to be more accurate: People would hopefully rate titles five stars that they considered worthy of the status quo. including award-winning documentaries that would then be left unattended in their queues for months. At the same time, they often drowned out reality TV shows that they themselves had given only three stars.

Moment of Truth: Hearts or Thumbs?

Now Netflix is ​​ready to once again add a bit more complexity to these ratings. This is partly because media consumption habits and app interfaces have changed across the board. “People are using Netflix for the sake of their overall lives,” Desai said. “They interact with Instagram, with different social networks, with ride-share apps.” Some of the interaction patterns in these apps and experiences were not easily applicable to Netflix, which is primarily used on television and has a much greater focus on leanback entertainment than, for example, Instagram. “But there are a few levers that our members are now asking for that they did not before,” she said.

Still, there were some unresolved questions, including what would work better: Hearts or thumbs? And would either actually have a lasting impact in addition to addressing the high voices in surveys and other forms of qualitative research?

“We have been in situations where we can hear very strong views in a qualitative environment that goes against what we find out in A / B testing,” Desai said. “That’s when the fun begins.”

Netflix began a series of A / B tests for the new rating feature last summer.Image: Netflix.

Netflix began a series of A / B tests for the new rating feature last summer, testing both the heart and the possibility of two thumbs up. At the same time, the company continued to ask subscribers, including those who had signed up for the tests, to see if the new features actually provided value.

Testing of the feature extended to the fall, as the teams working on it would make sure they got things right. “We are not in a hurry with a test,” Doig-Cardet said. “Sometimes there’s this drive to just start early and break things and all that. It’s not [our] approach. “One reason to perform A / B testing over weeks or even months is to let people get used to a function and see if the engagement remains high or if people are attracted to the news by a function and then get bored with that.

In the end, the numbers were clear: Giving further feedback worked. “We saw a very big boost in engagement because people had a new way of talking to us,” Desai said. That lift was much bigger with the two thumbs up than with the heart, which was a surprise since people in Netflix had expected the heart to win.

That kind of unexpected result is what makes A / B testing so valuable, Doig-Cardet said. “If we were not surprised, we would do something wrong,” she said. “We would validate our own assumptions instead of letting the numbers guide what is a better experience.”

Constant testing, though it may spoil the big reveal

Netflix’s extensive use of A / B testing has been well documented over the years, including by their own computer science team. The company is constantly testing a variety of features with subgroups of its audience. Basically, if you are a Netflix subscriber, there is a decent chance that you are signed up for some kind of test right now.

Some of these tests are for obvious interface adjustments, and some are related to under-the-hood codecs or infrastructure changes. In fact, Netflix does so many tests that members can sign up for more than one test at a time, which is why the company developed an entire experimentation platform that helps its computer science team avoid test conflicts and make sense of all the data collected. (Netflix offers members a chance to opt out of testing through their account settings.)

However, the development of the new two-thumbs-up feature also shows that A / B testing alone is not enough. Without also talking directly to subscribers, the company would have prioritized the development of the heart icon and would not have given two thumbs up a chance to prove itself in A / B tests. “We take this multi-stranded approach to looking at a lot of different inputs,” Doig-Cardet said. “We capture insights from our customer service, from surveys, from interviews we do, and use all of this to inform [what] we should invest in and test. “

Both surveys and A / B tests have a risk of exposing future functions to the public. Subscribers often write about new things they’ve seen in the app, and journalists tend to jump on those stories to shed light on the company’s roadmap. For Netflix, it’s just a cost to do business. “We are comfortable making that trade-off of providing early visibility because we want to make sure it works for our members,” Doig-Cardet said.

“Previous places where I worked, there is this amazing unveiling of the feature, with the campaign and all that,” Desai added. Instead, Netflix operates a little more outdoors, which includes testing new and unannounced features with tens of thousands of members.

“This is our bread and butter,” Desai said. “It’s our secret sauce for how we innovate.”

Leave a Comment