IMO the recommendations are no good because they fundamentally take the wrong approach — rather than ask the user what they like, they try to guess what you like based on usage (which really doesn't correlate well — I watch a lot of garbage because I can’t find things I like, and I don’t have anything better to do.)
And they don’t ask because users don’t provide useful answers.
But users don’t provide useful answers, because rating things doesn’t do anyone any good.
I’m of the belief that if you can make ratings useful (catalogue all movies, including not on Netflix; give useful ways to view/update your lists; have direct relationships to recommendations), you would have dramatically better recommendations for dramatically less effort/complexity.
I don’t think you’ll ever get to “good” recommendations based on usage. The data is fundamentally garbage.
Of course, the other side is that Netflix isn’t interested in recommending things I like; their goal is to recommend things I’ll put up with. They just need 1 show worth watching and subscribing for every now and then, and N shows to keep me mildly amused to stop me from dropping it between good ones
The recommendation system, historically (i.e., in the long-long ago of spinning disks), was insanely good. But then Netflix moved to streaming and, as a consequence, its own--and generally less good--content.
By analogy, Netflix went from being a sci-fi future of having and being able to recommend on the basis of _everything_, to having a handful of good offerings and a huge amount of b-movie-level offerings.
My gut sense is management tried to paper over this "content loss problem" by making changes:
1) to the recommendation system to push Netflix content[1]; and
2) making changes to the UI to force users to be more reliant on the recommendation system.
I suspect these changes have, generally speaking, made user-consumption metrics look decent--in my mind the core of almost all Netflix's post-streaming decisions. But, as you suggest, it is all papering over a problem of user dissatisfaction: Netflix recommends you mediocre content, and you eventually give up and watch it--and then feel meh.
[1] I can imagine Netflix executives being unwilling to report that the content Netflix had paid mightily for scored low on Netflix's own recommendation algorithm. Philosophically, Netflix went from being, essentially, content agnostic (e.g., it just bought more of X DVD), to having incentives to see particular content (e.g., its own) rank highly.
In Mark Randolph's book he talks about how Netflix would recommend content (DVDs at the time) to strategically fit Netflix's needs. For example, if they didn't have a copy of a movie ready to send out, Netflix wouldn't recommend it.
Now a days, I'm certain Netflix recommends content to feature either "no cost" (owned) or the content with the lowest licensing fee. I don't believe for a second they don't have the data suggest the best movie. They simply don't want to suggest the best movie. As you said, their goal (now) isn't to suggest the content the user is likely to enjoy most, it's to suggest content the user will tolerate. And that's exactly why they shifted away from a 5 star rating system, to a thumbs up/down approach... even if you didn't love a movie or show, you're still likely to give it a thumbs up unless it was totally awful.
If you have an Audible subscription, you may have noticed the same behaviour there.
Large numbers of books labelled as 'free with your membership', which likely only cost Amazon the price of delivering the files. Which makes sense, because once I have paid for my credit the worst outcome financially is that I use it.
Who's more likely to keep renewing their subscription? The person who uses Netflix to watch a ton of trash that they think is "just ok," or the person who merely watches 1 or 2 things per month that they actually enjoy?
I'm certain Netflix ran the numbers, and determined that a high-usage customer is the most valuable.
It's interesting how many corporations don't actually "run the numbers" on what we think are important issues. Basically, internal focus and what the rest of the world cares about are disjointed and corps are often blind to obvious aspects. This can be improved by strong internal diversity, but Netflix doesn't look like a bastion of that (yet?)
On "just ok" vs stuff actually enjoyable, "just ok" is fine until there is no better competitor for attention (e.g. a new smartphone game takes over the world). If they get to fit on the "actually enjoyable" scale instead, there is a better chance for people to keep their subscription, sometimes even if they end not viewing anything that month for whatever reason.
Former Netflix employee (2010-2013) here and, if there's one thing Netflix does as well or better than anyone in the industry, it's running the numbers. In particular, in those days, we had two key metrics that were strongly correlated and we would attempt to drive up: Streaming hours and account retention. Higher usage was strongly correlated with account retention to the point that they were the core of nearly every experiment we did.
I think these are reasonable numbers to focus on, but other relevant variables could be just too hard to quantify or set as goals...
For instance how Netflix's catalog is attractive to new users/markets can be checked in regular polls, but it would be way more difficult to follow with fine granularity, far less precise, and ultimately a harder to handle number than just retention or number of new accounts.
This means Netflix could see decent growth on its numbers, good retention and a steady flow of new accounts created, while struggling to reach new markets where competitors are doing great.
This is an extreme example, but Blackberry typically had very good user retention and users loved their devices. Looking only at these numbers, they were doing fine for a long time (which is nothing to sneeze at)
Wouldn't this be a short term vs long term optimization thing? In the short term "just ok" wins. In the long term, users might get bored and new users are less likely to join. Or at least that's my gut feeling, i have nothing to back it up.
Not necessarily, the power of habit can be very strong.
The users who only watch a couple of things are the ones who are more likely to “get bored”, because in any given month there is a higher chance that there won't be any single thing they'd want to watch. Whereas someone who just does it regularly (say every day after work while eating dinner or w/e) is more likely to keep that habit.
Netflix doesn't run ads for anything but their own content, right? It would seem to me their best customer is one who pays their monthly sub and then never uses the service.
On the other hand, the only platform that provides me with good recommendations to watch things seems to be TikTok. They are not asking me to rate individual videos, and so on. Clearly, there is a way to do recommendations without "ratings".
I think you've hit the nail on the head here. There's no good way to express how I feel when I'm watching a show or at the end of the show. There's no "Holy shit, this is amazing" vs "This is decent" etc - where sentiment is clearly attached to the rating. A 5 star or 3 star rating scale alone isn't quite good enough..
I don’t think that’s correct. 1-5 stars is sufficient. The problem is that you need reason to continuously update the values as your preferences update over time (what was once a 5-star is now a 4-star, because that last movie I saw was phenomenal)
What you need is sufficient reason to do so — the values need to actually be useful to you to make updating an act of sanity (unlike now, where it’s purely an act of futility). Feeding the algorithm is not itself sufficient (though necessary, and currently ineffective). The ideal recommendation system would encourage rating entry as a ritual act, and more importantly, rating updates an act that derives real value.
Only then will you have good data, and from good data, a dumb algorithm will suffice.
The problem is data entry for the recommendation algorithm is insufficient incentive to constantly use it (thereby providing “truthful”, or highly-correlated, user ratings). The ratings themselves must be directly beneficial to the user, so that the user provides truthful data for their own benefit, and secondarily for the recommendation algorithm.
That is, I’d like to catalog my own list of watched movies, and their relative ratings, so that I can have a useful system (or a direct relationship to recommendations — eg More Like This), from which Netflix can scrape for their algorithms.
That is, if I’m not honest to myself, the ratings themselves will not be honest, and not properly reflect my taste.
Specifically, there must be reason to provide negative ratings in addition to positive, to capture user taste.
Or, like, maybe just let me turn off the auto recommendations if I want to? It makes me actively uncomfortable to think that everything I watch on Netflix is going to change what I see in the future!
MAL was actually my source for thinking on this subject. In combination with the book Otaku: Database Animals[0] (anime fans catalogue the hell out of things, and this extends to tracking their anime and ratings) I realized you should be able to put together some very strong recommendations by scraping the MAL dataset — because the data should be fairly honest.
And then the realization that really the best recommendation isn’t to forge a new customized list altogether — it’s to simply find the most similar users and recommend items from their list. (MAL has/had a cosine similarity function for this, but no way to search because it’s basically an n^2 algorithm on 4M users; apparently they offered it at some point, and quickly found it untenable. That was what really kicked me off)
And then the realization that if I found users with similar taste, then shouldn’t they be friends? So then it becomes a MAL friendship algorithm..
Did a bunch of research on recommendation algorithms and weighting strategies, scraped most of the MAL users, stored it in a database, and then promptly procrastinated on actually implementing the algorithms. Been sitting on that for like 3 years now :|
You may watch garbage (revealed preferences) but that is more important to them in terms of keeping your attention than your wish list (stated preference).
That’s my point. The algorithm goal is not to find what I’d like, but rather what I’d put up with. They only need to find what I’d like every so often, to keep me on the platform.
It’s correct from Netflix’s perspective, but not from mine.
And they don’t ask because users don’t provide useful answers.
But users don’t provide useful answers, because rating things doesn’t do anyone any good.
I’m of the belief that if you can make ratings useful (catalogue all movies, including not on Netflix; give useful ways to view/update your lists; have direct relationships to recommendations), you would have dramatically better recommendations for dramatically less effort/complexity.
I don’t think you’ll ever get to “good” recommendations based on usage. The data is fundamentally garbage.
Of course, the other side is that Netflix isn’t interested in recommending things I like; their goal is to recommend things I’ll put up with. They just need 1 show worth watching and subscribing for every now and then, and N shows to keep me mildly amused to stop me from dropping it between good ones