Thank you for sharing this paper, it's the first attempt I've seen at using Neur...

Thank you for sharing this paper, it's the first attempt I've seen at using Neural Nets for the candidate generation portion, which was cool to see.

How do you decide what the N in Top N should be?

I see you guys scaled features yourselves, why not use BatchNorm?

Do you think you could have eliminated the manual feature engineering with some learnable feature engineering? I'm mostly thinking of some sort of parametric activation functions, but I'm curious if you've thought about it.

Any thoughts on the Wide & Deep paper, did you try incorporating similar ideas?

Did you experiment with LSTMs for turning watch/search histories into fixed vectors?

You guys trained a regression model, whereas the common wisdom is that neural nets aren't so hot at regression, did you try training this as a bucketized classification problem?

Again, thanks for the paper and taking time to answer questions :)