Basic autoencoders are conceptually easy, but try to understand how the VAE KL loss was derived. I remember I spent two days reading https://arxiv.org/abs/1906.02691 and I still didn't quite get it.
Thank you! Do you have any recommendations for educational resources at an even more basic level of understanding of variational inference? The first paper you mentioned is "for statisticians", and it seems pretty dense to me on the first glance. My current level is somewhere at the first undergrad pstat course.
I really like Kevin Murphy's books. I learned a lot from them when I was getting started on this stuff. He's just released a two part follow up to his 2012 book. Both parts are available online.
I also really like David Mackay's book (also free online). It's...idiosyncratic. But fantastic!
Best of luck with your learning! Start simple, go slow. Write code every step of the way!! ^_^
Explaining Autoencoders seems to be a rite of passage for ML bloggers. It's conceptually understandable but has broad applications and is a jumping off point into other architectures.
Educational content tends to have a bias towards concepts that are easier to explain rather than what is most useful. But as far as the latter goes, Autoencoders are a good place to start.
Autoencoders are insightful because they move the concept of "intermediate representations" to the foreground. This perspective is useful for understanding the compositional architectures of NNs at any scale.
NNs can be generally described as "complicated feature engineering stacks followed by a relatively simple regression". This description captures the information processing and compression perspectives on NNs, which is realized by autoencoders in a laboratory-ideal way.
Can you explain it now? I'll give it a shot now. The minimal KL divergence should be equivalent to the maximum likelihood just another way to express/compute it.