I always get excited when I see these "tutorials for dummies" (like me). Like "f...

travisjungroth · on Oct 15, 2023

One of the biggest challenges in learning, especially in learning something new, is that it’s a chain. If you don’t have something’s prerequisite knowledge, you can’t understand it (by definition). This means that only one insufficient explanation by the author blows it. As a general rule, it’s really hard to not mess up once!

You make a good distinction between someone being paid to respond and not. A feedback loop helps out a lot here.

Failing that, you need to take this into your own hands. Honestly, you’re probably going to have to tell yourself a new story to get there. Maybe having empathy for the difficulty of teaching. Maybe it’s finding some inner drive. I don’t know. But you need to look at that paragraph, accept it’s insufficiently explained, and take responsibility for understanding it.

I’m not saying to read it over and over until you “get it”. (I don’t know why people try that, it’s kinda foolish). A simple strategy works most of the time. Read it until you find a word or phrase you don’t sufficiently understand. Maybe that’s “random variable”. Maybe it’s “probability density function”. Find an explanation for that (Wikipedia, ChatGPT, textbooks, videos). The fun thing is this algorithm is recursive. So you’ll likely run into something you don’t know again. That’s okay, just keep going. If it’s really tough, a lot of the value of a tutor is steering this depth first search.

Get each concept to the point you understand it very well for the problem at hand. You don’t have to know everything about PDFs, but don’t hand wave it either. After this process, you’ll be able understand this paragraph and continue.

This may take a while! If something is in a new domain, it’s usual for you to actually spend more time backfilling knowledge than on the main content itself. That may make it not worth it for you, but it’s not inherently bad. And the next time for something similar should be faster.

syntaxfree · on Oct 15, 2023

The pdf is just the pullback measure.

A random variable is a function X(w) taking (eg) real values. In your probability space you already have an ambient measure space and an ambient probability measure P which takes sets in the measure space to [0,1]. The pdf is then the function defined on sets P(invX(q)). invX is a set valued inverse.

Ok, consider coin flips. Then X takes each element of sample space either to 1 or -1. Set values inverse of 1 is the sets that map to 1. Then we get the ambient probability measure of them.

You don’t really have to cope with measure theory in full to take this tiny step.

travisjungroth · on Oct 15, 2023

I don’t think the set of people who couldn’t understand the quoted paragraph but could understand your comment is very large.

patrick451 · on Oct 15, 2023

I'd be shocked if somebody knows what an ambient measure space is but doesn't understand the nth moment of a random variable.

rhymer · on Oct 16, 2023

1/ I think you are referring to pushforward measure (https://en.wikipedia.org/wiki/Pushforward_measure): the random variable "pushes" the probability measure to its codomain. 2/ pdf requires a stronger condition: the pushforward measure needs to be absolutely continuous with respect to the sigma-finite measure (usually the Lebegue measure) on the codomain.

mojomark · on Oct 15, 2023

If anyone was frustrated like me on this concept of "moments" I found the following insightful and fascinating:

https://gregorygundersen.com/blog/2020/04/11/moments/

I have no idea how I got through undergrad and graduate school without internalizing this concept that seems so foundational. Whacky

themeiguoren · on Oct 17, 2023

I thought I had a pretty good grasp on this, but the idea that an infinite sum of higher order moments uniquely defines a distribution in a way analogous to a Taylor series, was new and super interesting! It gives credence to the shorthand that the lower order moments (mean, variance, etc) are the most important properties of a distribution to capture, and is how you should approximate an unknown distribution given limited parameters.

ptftw · on Oct 15, 2023

The hardest part of teaching is it's impossible to remember what it was like to not already know something. You can't un-know it so get back your previous perspective. So, you forget all the things you take as known.

This is the problem I find with the 3blue1brown videos. They're pretty but I never get any understand from them. To people who already know they nod and see all the concepts they are already familiar with shown in a neat way but to some (like me) they don't generally get me to understand. Too many pre-reqs or someting

mayd · on Oct 15, 2023

> This is the problem I find with the 3blue1brown videos. They're pretty but I never get any understand from them.

So relieved to learn that I am not the only one!

lr1970 · on Oct 15, 2023

> Generally, I get let down. This time is no exception.

This is why I don't even bother to read through such tutorials. To understand Kalman filter one first need to understand basics of probability and then the importance of Gaussian distribution (Kalman filter mathematical derivation assumes that all the probab distributions involved are Gaussian). Then one notices that Gaussian distribution is uniquely defined when you know it first and seconds moments (yes, you cannot dance around introducing moments at some point). And then pretty nasty math follows :-) Kalman filter is not an easy thing. Rudolf Kalman claimed in one of the interviews that without his filter American landing on the Moon would not be possible.

morelandjs · on Oct 15, 2023

There are quite a few similarities between the Elo rating system and Kalman filters. I’ve always thought this would be a good way to teach them, because you can start with a simplified univariate case, then modify and generalize from there.

jwozn · on Oct 16, 2023

I would love to know more about the similarities. Can you share any resources that utilize Kalman filters in rating systems?

dkarras · on Oct 15, 2023

I feel this frustration in my bones, and I felt the same way up until the proliferation of LLMs. Just paste your comment to Claude (or ChatGPT maybe?) and it will explain everything you want to know in realtime.

bigbillheck · on Oct 15, 2023

Sure, if you don't care if it's actually correct or not.

bootsmann · on Oct 15, 2023

There are so many explanations of moments in the training set, it should be correct.

mojomark · on Oct 15, 2023

Ha, no joke. I definitely find myself doing that more and more. Good reminder.

penguin_booze · on Oct 15, 2023

> They always start off well, then inevitably there's a concept or key terminology that gets glossed over without sufficient explanation. .

Obligatory meme: how to draw an owl: https://www.memedroid.com/memes/detail/265779.

aqwsde · on Oct 15, 2023

Have a look here:

https://www.bzarg.com/p/how-a-kalman-filter-works-in-picture...

howenterprisey · on Oct 15, 2023

Yep. When I write tutorials I try very, very hard to avoid having a single moment like this. Takes more effort, but I find that it makes for a better explanation.

bigbillheck · on Oct 15, 2023

> and what makes the power of a randome value so special vice just operating on the randome value itself.

Literally four sentences later you would have found: * The first raw moment E(X) – the mean of the sequence of measurements. * The second central moment E((X−μX)2) – the variance of the sequence of measurements.

And if you had gotten as far as the part you quoted, you would have seen an extended example of why one is interested in means and variances.

mojomark · on Oct 15, 2023

I of course read that next section, and beyond, but respectfully disagree that the expressions provided don't justify further explanation. For example, the author neglects to define 'X' as the set of support values for some probability distribution. That's left to the reader to figure out for some reason. Further, nowhere is 'E(X)' defined as the integral of x*f(x) dx, or that the exponent 'k' only applies to the first 'x' term in that expression (i.e. if k=3, then E(X^3) = integral of (x^3)f(x) dx. How is the reader supposed to know all that?

That was left up to me to hunt down... which is fine I guess, but I certainly wouldn't say this is "from the ground up". At the very least, link to some external content that provides the necessary definitions.