Basic autoencoders are conceptually easy, but try to understand how the VAE KL l...

cgs1019 · on March 29, 2023

To grok the ELBO, don't start with VAEs; start with basic variational inference. https://arxiv.org/abs/1601.00670 is good, though maybe increasingly dated...Once you're happy with VI in general, https://arxiv.org/abs/1606.05908 is a nice read on the VAE.

p1esk · on March 29, 2023

Thank you! Do you have any recommendations for educational resources at an even more basic level of understanding of variational inference? The first paper you mentioned is "for statisticians", and it seems pretty dense to me on the first glance. My current level is somewhere at the first undergrad pstat course.

cgs1019 · on March 30, 2023

Oh yeah that's gonna be a dense read then :)

I really like Kevin Murphy's books. I learned a lot from them when I was getting started on this stuff. He's just released a two part follow up to his 2012 book. Both parts are available online.

I also really like David Mackay's book (also free online). It's...idiosyncratic. But fantastic!

Best of luck with your learning! Start simple, go slow. Write code every step of the way!! ^_^

SnooSux · on March 29, 2023

Explaining Autoencoders seems to be a rite of passage for ML bloggers. It's conceptually understandable but has broad applications and is a jumping off point into other architectures.

Educational content tends to have a bias towards concepts that are easier to explain rather than what is most useful. But as far as the latter goes, Autoencoders are a good place to start.

uoaei · on March 29, 2023

Autoencoders are insightful because they move the concept of "intermediate representations" to the foreground. This perspective is useful for understanding the compositional architectures of NNs at any scale.

NNs can be generally described as "complicated feature engineering stacks followed by a relatively simple regression". This description captures the information processing and compression perspectives on NNs, which is realized by autoencoders in a laboratory-ideal way.

theGnuMe · on March 29, 2023

Can you explain it now? I'll give it a shot now. The minimal KL divergence should be equivalent to the maximum likelihood just another way to express/compute it.