Author here - happy to answer questions about the techniques in the paper. We're super excited to finally share this work externally. Feedback about YouTube recommendations in general also welcome.
Do you study the phenomenon of information bubbles at Google? Let's say, a German user just happens to watch some right-wing populist video claiming that we need to stop Merkel's refugee politics. The next day the user might receive plenty of recommendations in their feed that confirm the message in the first video. They happen to stumble upon a video of some party convention by an uprising German populist party, and everything makes sense now! Video by video the user gets dragged into a right-wing ideology.
That is an information bubble. The algorithm cannot detect low quality or populism, neither can it recommend opposite standpoints, and at the end of the day it has a real effect on a country's politics and the well-being of many people.
Do you have means of quantifying such effects? What are possible countermeasures?
If you cannot talk about that, then this would be my feedback: Perhaps you could train a language model to find opposing views in video titles and tags and then diversify the video recommendations based on that.
What about the reverse scenario, though? Should someone who watches videos about refugee suffering be given anti-refugee video recommendations, lest they be dragged into a 'left wing ideology'? I don't see how that would be acceptable. Would Holocaust documentaries be 'diversified' with Holocaust denial videos?
'Information bubbles' have existed as long as people have had a choice of newspapers to buy and TV channels to watch. Calling for Youtube to artificially 'balance' videos seems like political interference.
While information bubbles have existed as long as people had a choice, at least with a newspaper or TV channel, you have to read and watch a little to understand whether it fits your liking. You have to put some effort in.
With recommendation engines, your bubble, without effort, ossifies.
There is no such thing as unbiased. So they have top pick how they are going to be biased. My guess is the way that makes the most ad revenue irrespective of ideology.
I think personalized media might bring information bubbles to a whole new level because recommendation systems are more efficient and achieving millions of YouTube views has arguably fewer intellectual hurdles than achieving similar reading rates via print or TV. The information passes though fewer filters (none), e.g. proof readers, team discussions. If that causes an increase of misinformation, then it is on Google to fix it, i.e. to reduce the amount of misinformation at least to pre-YouTube levels.
If you think relativism is fine. Then "As opposed to planting your flag in the ground that your camp is always right and the outgroup is evil?" is also fine.
See how silly and immediately self-contradictory relativism is?
"Trying to think a bit" would be on the side of having more information to think about, I'd imagine. Absolute relativism is a strawman you've brought into it, political tribalism is on the other hand a very real thing.
Why? Leftist people would also get more diverse recommendations. They'd be exposed to arguments such as the importance of maintaining western values etc. I actually did an experiment of wading through videos recommended based on anti-Merkel stuff. There are very few reasonable discussions and mostly it's horribly populist and poorly researched stuff.
What exactly are western values, other than a not-so-subtle way of asserting hostility to Muslims? The people who go on about the importance of opposing immigration in the name of maintaining liberal and tolerant values (a code word for 'gays and feminism') seem to be exactly the same people who hate gays and feminism in the first place.
> What exactly are western values, other than a not-so-subtle way of asserting hostility to Muslims?
For example that no death threats are spoken when a daughter defies her farther's will who she is supposed to spend time with. That caricaturists, satirists and atheists are safe. These are things that western cultures have established, and which could arguably be endangered by letting in refugees by the millions and by prohibiting cultural criticism at the same time. I am myself not convinced of the urgency of this threat, but I think these are some of the more convincing arguments against Merkel's refugee politics. Other arguments are for example second order effects or equilibrium effects, e.g. that conservatives, professionals and business folk amount to a counter-reaction that is worse than letting in refugees in a more controlled way (i.e. Brexit and brain drain).
> exactly the same people who hate gays and feminism in the first place.
I have no idea about the numbers, but I am pretty certain both groups exist. Those who use these arguments as pretense and those who are honestly concerned about the efficiency, safety and trust our culture has established (which e.g. allow us to focus on education, art and science).
I look at the Recommended section daily, and I find it very disappointing for several reasons:
- the recommendations are very often not interesting to me because
+ they cater to the lowest common denominator (you won't believe these 10 hilarious fails, PewDiePie picks his nose, etc.)
+ I have already watched the video
+ a video has been in the Recommended section for weeks and I haven't clicked on it. What makes you think I'll change my mind after several weeks? If I don't click a video within a couple of days of it appearing in the section, it's a dud. Don't keep showing it
+ the video is from a channel I am already subscribed to. That's not a recommendation, it's trivial and not helpful
+ most or all of the videos in the section are sometimes matching the same key word. I once clicked on an Amy Schumer video, and for many days every video in the Recommended section was a Schumer video. This is terrible. The same thing happened after I clicked a Craig Ferguson video.
- the feedback UI is not streamlined. I have to click through multiple menus to be able to say: not interested in this channel
- there should be list of key words that I can specify where if the video matches one of them, don't add it to the section. Conversely, there should be a list of key words that when I specify them, the recommendation engine goes out and looks for videos matching them, and then adds some of them to the section
I love watching interesting and creative how-to videos (DiResta, Tested, etc.), but even after several years of watching them, the recommendation engine seems to not have caught on to that.
Is the deep learning approach already deployed for regular users? I have not seen a change in the quality of the recommendations.
Sorry to sound so negative, but I think this is a huge wasted opportunity. There is tons of amazing content on youtube, and it's often very hard to find.
Despite the negative connotation, I wholeheartedly agree with this assessment. I usually ignore YT's Recommended videos for the same reasons you describe.
"To correct for this, we feed the age of the
training example as a feature during training"
Does this mean something different from feeding the age of the video, relative to when the training example was recorded? Feeding in the age of the video seems like a fairly obvious idea and like it should train the network to favor newer videos. If it actually means how long ago the training example was recorded that is rather strange, as I don't see how that would be needed on top of the video age. Neat graph, there.
I am often annoyed at how overly focused online recommendations systems are for my overly specific recent trends, rather than broader interests I display over months or years of using a product (looking at you Amazon). It seems like it should be relatively easy to learn 'this guy likes little video essays about art and science and sometimes fun talk shows' and yet YouTube has been pretty bad at recommending such video-essay style content to me. Perhaps this will improve it, although I wonder how much the recent history features end up overwhelming overall years-long type data about what interests me broadly and not just yesterday.
As an aside, is it really "Deep Neural Networks for YouTube Recommendations" if you are using 5-ish layers of embedding, ReLu units, and output? A bit humorous, that.
Your intuition is correct - there are other ways to capture the non-stationary nature of this particular problem. We thought that the example age approach is neat because it is a general technique for removing bias inherent to any machine learning system. Since examples always come from the past, you often have to be careful to prevent any system from being overly biased towards historical behavior. You don't need any additional metadata about items (what's the age of a search query?) and it's more resilient to predicting in regions the model has never seen because you fix serving to the very end of the training window.
I tend to think the focus on recent behavior is an artifact of underfitting. Research into richer temporal modeling is needed and recurrent networks seem promising.
We debated internally whether to use the "deep" moniker - Alexnet was 8 layers, so maybe the threshold is 8? The depth seems sort of irrelevant since stacking layers is trivial once the basic architecture is in place.
n conjugation with other product areas across Google, YouTube has undergone a fundamental paradigm shift to- wards using deep learning as a general-purpose solution for nearly all learning problems.
Can you talk about how this works in practice? Is the deep learning group separate from other teams and then tackles problems from different areas as needed, or are there deep learning engineers in each project area that are building nets for each different area? Is the ML team also redesigning product architecture by building products around reinforcement learning?
There are many close collaborations between product and research, as well as direct exchanges between different product areas. Close collaboration is key because those working directly on the product understand best the data, serving system and fundamental constraints.
A recent article [1] revealed how engineers are trained in ML across Google.
What I've heard from Google employees is that if you work there, you are getting training on Deep Learning for sure. It doesn't matter what team you're on, Google is now essentially a deep learning company (which sells ads).
9 out of 10 recommendations for me are terrible. They're either videos I've already watched or they're garbage designed to entice 13 year olds like "Top 10 Boobs In Movies." I assume this is caused by watching a lot of let's play and other game videos.
The recommendation system can't seem to handle outliers but maybe that's asking too much of current technology.
This is probably offtopic but I unfortunately have to agree with few others here, saying that my recommendations have never been that great.
The system suggests me lots of click-baits and low content quality videos (with massive views though). It's very rare that i get a great video that i eventually really enjoy in my recommandations.
My guess is that the system can't really tell if the video itself is made of good quality, brings good and fresh content.
Is that the case? how do you guys rate and measure the intrinsic video quality?
I am very interest in this. Deep NN are quite an interesting subject and something I'm personally quite curious about.
I also use youtube recommendations quite a lot for some fairly specialize interests [which I'll keep unstated for now]. My current impression has been that the recommendation system has only gotten worse in the last ten years and is now nearly broken (I get recommendations from third party websites now).
As I recall things, Youtube removed most user recommendation controls 5-10 years ago and the guesses it makes still haven't made up for this loss.
But there are other things I find even harder to understand. I find that when I'm not logged in, after choosing 5-10 videos, youtube will start to recommend good stuff, indeed things that I'd like on my regular recommendation list but which I never do see there.
My impression of my regular recommendations is that serves nothing but crudes averages, videos that I just assume someone pays Google to recommend. ("Sports" "celebrity fails", etc).
Which brings me to shock that the cream of the cream of AI somehow deploys this to me. I get that Convnets have made quantum leaps in image recognition competitions. AlphaGo was a clear advance. But where is the progress here? If the recommendation engine is categorizing videos, either the categorizations don't correspond to my experiences or its using the categorizations incorrectly. Broadly, my impression is the algorithm is swayed by whether a video is broadly popular rather than whether its in a given category. And I work hard to prune every off-topic suggested video or suggested topic, yet I get what seems like poor to worthless quality recommendations.
> Feedback about YouTube recommendations in general
Please make it so I can block specific YouTube users from EVER being recommended to me, or showing up in sidebars or whatever.
I feel these features might actually start being useful to me if there was a way to tell YT "Please, please stop showing me this users videos, I absolutely never want to watch them at all".
Thanks for making your hard work available. It is very interesting from a technical point of view. I'm struck by just how huge a challenge this is given the enormous corpus size.
Are you familiar with Joe Edelman's work?[0] He specifically uses YouTube recommendations as an example of many algorithms designed to use the wrong metrics which leads to undesirable outcomes for users.
Have you ever looked into attributing reasons to users' visits? It seems likely that many users aren't looking for general recommendations that blend their entire use of YouTube together but want specialized recommendations linked to why they visited YouTube this time.
I use Youtube every day and find the recommendation engine extremely predictable. Whatever video(s) I've watched recently all the way through seem to dominate the recommendations. Which on the surface seems logical, but sometimes I watched a video and at the end decided I didn't really like it. I end up having to do a LOT of "Not interested" -> "I don't like this channel" / "I don't like this video" to clear them out.
I wish the recommendation engine had a better idea of what I liked based on the fact that I've been using Youtube for years, and I've thumbs-upped a lot of videos, and told it a lot of channels and videos that I don't like. But maybe that's just asking too much?
Thank you for sharing this paper, it's the first attempt I've seen at using Neural Nets for the candidate generation portion, which was cool to see.
How do you decide what the N in Top N should be?
I see you guys scaled features yourselves, why not use BatchNorm?
Do you think you could have eliminated the manual feature engineering with some learnable feature engineering? I'm mostly thinking of some sort of parametric activation functions, but I'm curious if you've thought about it.
Any thoughts on the Wide & Deep paper, did you try incorporating similar ideas?
Did you experiment with LSTMs for turning watch/search histories into fixed vectors?
You guys trained a regression model, whereas the common wisdom is that neural nets aren't so hot at regression, did you try training this as a bucketized classification problem?
Again, thanks for the paper and taking time to answer questions :)
First question: Are your innovations being used on Youtube today? At what date did you "pull the switch" and go over to the DNN-based recommender systems?
I'd say recommendations have gotten somewhat better, if a little too clickbaity still (you saw one video with a squirrel? here are ten squirrel video compilations!).
Second question: Did humans at some point assign names to DNN-established clusters or vector elements or what they're called? Sometimes I get OK recommendations, but with a really bad label (for instance a 100% minecraft LPer recommended as an example of a "shooter").
Could you elaborate on "we learn high dimensional embeddings for each video in a fixed vocabulary and feed these embeddings into a feedforward neural network."
So, each video is mapped to fixed size vector of floats? A user's history is now a matrix of size [number of videos, embedding size]? What are the other parameters in this sentence "Importantly, the embeddings are learned jointly with all other model parameters through normal gradient descent back propagation updates."? And how do you concatenate all these into a "wide layer" when users would have histories of different length?
Figure 3 illustrates that the variable sized watch history is combined with an average operation. This is partially why the embeddings need to be so large - in order to retain information after averaging, you need lots of dimensions to spread out disparate items.
This is of course not optimal, as the network should be able to learn how best to summarize the sequence. In the paper, however, we emphasize the importance of withholding certain sequential information from the classifier.
Have you experimented with replacing the averaging operation on the vectors with a recurrent network such as an LSTM. This way you can not ignore the temporal nature of the feedback (I have had success improving metrics doing this on implicit streaming video feedback).
My experience with YouTube's recommendations have been consistently near-sighted for as long as I've known them. Always recommends the most recent topic/theme in general.
Would love to see some more details on how you represent videos as feature vectors. Do you only use metadata provided by the uploaders (e.g. title and tags), or do you also analyze the raw video/audio somehow to augment the metadata?
The video embeddings in the paper are learned purely based on observing what users co-watch in sessions. In this sense, they can be thought of as latent factors in more traditional collaborative filtering approaches. When we inspect them, nearby vectors have a surprising amount of semantic similarity.
Features about the videos such as titles and tags, as well as features derived from audio and video, are introduced in the ranking phase.
word2vec did inspire earlier iterations of the model, but the key insight is that embeddings are learned jointly with all other model parameters. There is no separate source of embeddings. This way, embeddings are specialized for the the specific task.
In general what could be a separate source of embeddings? Also, how do these embeddings compare against traditional CF based latent factors?(I ask this in terms of a recommender metric and not complexity)
Great paper! How do you guys deal with new users and new items with little or zero historical data? Seems like the model wouldn't have good latent factors for them
Thanks! This model handles new users gracefully because it can fallback to demographic/geographic priors and gradually specialize as the user watches videos. New items are difficult because of the fixed output vocabulary and batch training. In practice, this model is best suited for the head of the distribution and other specialized recommenders handle extremely fresh/low viewcount items. Feature engineering is key for new content during the ranking phase.
Yes, it's a reasonable proxy. It was challenging to set up similar experiments with the old system because it was trained to approximate a different "surrogate" problem. We've also found that recommendation systems are very difficult to evaluate offline.