It's tricky I think, I think it comes down to specialized tools for different systems. I'm building a tool for Japanese in this direction. But of course it doesn't generalize to everything since the content and context extraction is very objective dependent.
AI definitely helps here (and it's the direction I'm going), although to start exploring the space in a way that makes sense to me (and because it's a problem I have), I've reduced the space into these steps:
* I wanna be able to watch X in Y language
* I know these words
* Help me