How a Kalman filter works, in pictures

gyre007 · on Dec 7, 2021

Couple of years ago I blogged about how it was developed for the Apollo program [1] and then I wrote an implementation of different variations of it in Go [2].

[1] https://cybernetist.com/2019/01/13/apollo-kalman-filter-and-...

[2] https://github.com/milosgajdos/go-estimate

michelpp · on Dec 7, 2021

Michel van Biezen has an excellent video series on Kalman filters:

https://www.youtube.com/watch?v=CaCcOwJPytQ

He walks you though all the way from the rationale, to a simple algebraic example, to the full blown Linear Algebraic Matrix solution. It's really worth watching even just the first couple videos to get a deeper understanding of this very useful technique.

moffkalast · on Dec 7, 2021

> The Kalman Filter (video 1 of 55)

That seems... excessive.

persedes · on Dec 7, 2021

it's not 55 videos (dunno if he forgot to upload them or misnumbered them...). Also most of it (I'd say 70%) is him doing matrix multiplications on a whiteboard. Which is excellent if you need a refresher in linear algebra (or just suck at it ;)), otherwise you can skip a lot of the content. That aside, he does an excellent job explaining it.

sp332 · on Dec 8, 2021

He got to 42 https://m.youtube.com/playlist?list=PLX2gX-ftPVXU3oUFNATxGXY...

iamcreasy · on Dec 8, 2021

Average length of these videos are 5 minutes.

jvanderbot · on Dec 7, 2021

The best exercise I had in our sensing & estimation class was to derive the Kalman filter from 'scratch'. It's actually just the series of steps used to perform a minimization of a convex cost function, where the cost function is the inverse likelihood of the measurements given the data.

Someone once joked that all of sensing and estimation (and path planning and optimal control) is just applied convex minimization.

NougatRillettes · on Dec 7, 2021

Nice. I wrote up another possible derivation here https://ngr.yt/blog/kalman/ in case anyone is interested.

6gvONxR4sf7o · on Dec 7, 2021

Where KFs finally clicked for me was in deriving the recursive form of good old fashioned linear regression. You can do linear regression a data point at a time, or a batch at a time, and while I can't remember if it works out to exactly the same thing (because of the recursive state), it works out to essentially the same. Dynamic bayesian networks are a superclass of KFs and they're a great thing to have in your tool kit if you work with data.

rland · on Dec 7, 2021

This article is fantastic.

For those who have some familiarity with Python, I found this to be a great resource for Kalman Filtering: https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt...

CornCobs · on Dec 8, 2021

Seconded! I worked through this partway and appreciate some of the intuitions they build up.

It seems to me that there is no shortcuts to grokking the KF than going through simpler filters/Bayesian update algorithms.

Another resource I personally enjoy is Student Dave's lectures on YouTube. He's really entertaining and shows both pen-and-paper examples and Matlab simulations.

stavros · on Dec 8, 2021

Excellent, thank you! The article did a good job telling me what a Kalman filter looks like in math, but I really wanted one that told me what it looks like in code.

east2west · on Dec 7, 2021

The most thorough explanation of Kalman filter as recursive least square coupled with linear state-space models is "Linear Estimation" by Thomas Kailath, Ali H. Sayed, Babak Hassibi. It covers robust numerical algorithms such as square-root Kalman filter. Unfortunately, it is a big book and reading it is going to be a big time investment, but I believe it will be a worthwhile investment. It develops in depth various linear algebra concepts and demonstrates profound application to linear least square. This book is what people mean when they say learn from masters.

It is also the only technical book I have read that aptly quotes Shakespeare: "Age cannot whither her, nor custom stale her infinite variety."

InnerGargoyle · on Dec 8, 2021

thx for the suggestion.

artemisyna · on Dec 7, 2021

I appreciate how this post gets resubmitted every couple of years. It's been a go-to bookmark for ages now and really does help give some visual intuition for what's happening.

sydthrowaway · on Dec 8, 2021

Where do you use it?

mgaunard · on Dec 7, 2021

The way I've always seen a Kalman filter is as a recursive least squares fitter, but instead of being run on the same input until convergence, it is run one step on every new sample of ever-changing inputs, in effect leading to a smoothing fitter.

For some reason I rarely see it being described as such.

nextaccountic · on Dec 7, 2021

What's the "recursive" part? It's like the difference between IIR and FIR?

gugagore · on Dec 7, 2021

That is an astute connection. The "recursive" part is that you do not need to keep all the history of observations and actions (often denoted y and u. in control theory). You can summarize the past with sufficient statistics (the mean and the variance for linear quadratic gaussian (LQG) problems).

You could imagine keeping all of the data and fitting a model to that. For LQG problems, you can use dynamic programming [1] to solve the problem faster.

You could alternatively imagine keeping a finite window of data and fitting a model to that. That filter would have a finite impulse response.

[1] https://en.wikipedia.org/wiki/Dynamic_programming btw that term kind of means two things, and this is a situation in which the two meanings overlap.

ephimetheus · on Dec 7, 2021

We also use it for fitting charged particle tracks in high energy physics experiments.

Here, we’re iteratively incorporating measurements along the trajectory, finally arriving at basically a least squares fit result, after a backward smoothing pass.

chmaynard · on Dec 7, 2021

(2015) https://news.ycombinator.com/from?site=bzarg.com

leeoniya · on Dec 7, 2021

somewhat related fft/data smoothing:

https://dawn.cs.stanford.edu/2017/08/07/asap/

made a recent demo vs a moving avg: https://leeoniya.github.io/uPlot/demos/data-smoothing.html

moving avg takes a bunch of samples to converge, while ASAP does it much faster, which seems to also be a key property of these Kalman filters as well?

ggm · on Dec 8, 2021

Although not a Kalman filter, the "three cornered hat" thing sailors do to apply three distinct points of measurement to some reference and then using the triangle that forms as a limit to uncertainty is not dissimilar in intent and outcome.

Perhaps the main quality a kalman filter brings is the repeated application. If you imagine 20 sailors all doing the three cornered hat and another 20 doing a log line to assess speed, and a third doing some kind of ded reckoning.. a Kalman filter is what you apply over all of them to reduce it to your best most certain position and speed and heading?

I do wonder if false positives can drive in reverse to higher levels of uncertainty, not refinement. So a spurious echo on radar might contradict two other inputs.

The two overlapping coloured Gaussian clouds and the "aha" moment in the common region was good. It reinforces the idea of how two different measurements with their own uncertainty can combine to give a better certainty of the region of interest.

Were these not also used in analogue circuits to do feedback for missile tracking from radar/telemetry?

I struggle with algebras and formulae. The graphics help imbue some understanding.

sklargh · on Dec 7, 2021

When I started working with aviation people early in my career, learning that radar only has a fairly good guess of where you are was pretty amazing.

mrguyorama · on Dec 7, 2021

Unfortunately for me, the intuitive part of Kalman filters is easy, but I don't have nearly the kind of grasp of Linear algebra I would need to implement one (I think).

Are there drop in, batteries included, ready to go kalman filter implementations/frameworks for common microcontrollers like arduino and raspberry pi 2040? Or is it infeasible to implement them in limited setups?

dr-ando · on Dec 7, 2021

The computational requirements are very modest. The magic is in the math. I'm not sure if it counts as "batteries included" but I wrote a Kalman filter implementation in "no-std" (no standard library) rust called adskalman [1]. This means it can run on very modest bare metal targets with no operating system. Of course, it can also run on targets where the standard library is available and the examples [2] make use of this to do nice things like print the results which can be piped to a file and plotted. The core runs fine on embedded targets, and we use this with arm microcontrollers, but it should work on just about anything. Feedback is welcome.

[1] https://crates.io/crates/adskalman [2] https://github.com/strawlab/adskalman-rs/blob/main/examples/...

east2west · on Dec 7, 2021

I believe ROS (Robotic operating system) has good implementations of state estimation algorithms. If you are worried about memory footprint, then Durbin and Koopman ("Time Series Analysis by State Space Methods") has a scalar version of squared-root Kalman filter (it ingests one number at a time rather a whole vector at a time). You may have to implement it yourself though.

dbcurtis · on Dec 7, 2021

Totally feasible in microcontroller for small ones and modest update rates, yes. I have worked with robots that had them implemented in slow 8-bit microcontrollers. (I did not do the KF, though).

_p4lt · on Dec 7, 2021

> Unfortunately for me, the intuitive part of Kalman filters is easy, but I don't have nearly the kind of grasp of Linear algebra I would need to implement one (I think).

This is me, anyone got a good linear algebra resource to recommend? Ideally one that addresses 'I took linear algebra ages ago in uni but it didn't really stick'.

east2west · on Dec 7, 2021

I really like "Linear Algebra and Its Applications" by Gilbert Strang. It is not a textbook, the style is conversational, and he really tries to help you learn. It is one of rare math books that includes reason, context, application, and history without sacrificing rigor. The book also focuses on numerical algorithms aspects more than some popular textbooks, which may be helpful to understanding Kalman filter.

_p4lt · on Dec 7, 2021

Thanks for the suggestion!

sjburt · on Dec 7, 2021

I think you still need a good grasp of linear algebra and difference equations to identify state variables and correctly set up the “model” or “plant” matrix, this is specific to the system so it can’t be provided by the framework. If you can do this, the rest of the Kalman filter is straightforward and can easily be done in numpy etc.

salamanderman · on Dec 8, 2021

If you're willing to just believe that there's a derivation that it works, actually implementing a kalman filter is pretty straightforward. The proof is a pain, but you don't have to be able to derive it to use it.

CamperBob2 · on Dec 8, 2021

You might be interested in TinyEKF: https://github.com/simondlevy/TinyEKF

Rainymood · on Dec 8, 2021

I remember this blog post from ~3 years ago when I had to write my MPhil thesis :) thanks for this, it really helped me understand it!

xaedes · on Dec 7, 2021

Welford's online algorithm for computing weighted mean is essentially just Kalman filtering: Weights are the inverse of measurement covariance, i.e. the information, the initial state is the first measurement, state transition and observation model are 1x1 identity matrices.

albinofrenchy · on Dec 7, 2021

I've been considering writing up something for extended kalman filters like this. Kalman filters are great but in my experience extremely limited without the non-linear extensions which make them useful in a much wider variety of problems.

marton78 · on Dec 7, 2021

Your should have a look at sigma point Kalman Filters.

gugagore · on Dec 8, 2021

Those are called unscented Kalman filters, because they don’t stink…

bGl2YW5j · on Dec 8, 2021

Awesome article. Maths education needs way more visualisation in it like this.

I first found out about Kalman Filters when looking up how large ships use software to remain stationary in the same geographical location even whilst surrounded by crazy waves.

macrolocal · on Dec 8, 2021

As a matter of principle, I feel like this excellent article should contain contain the term "Bayesian update."

chillingeffect · on Dec 7, 2021

down atm. https://archive.md/cLjvZ

maCDzP · on Dec 7, 2021

I think of it as an algorithm for Bayes theorem - is that right?

iamcreasy · on Dec 7, 2021

Bayes filter - yes.

jiggawatts · on Dec 7, 2021

A related and much more powerful predictor in a noisy environment is the Particle Filter, which uses a mechanism that is particularly appealing to me as a programmer…

tbabb · on Dec 7, 2021

Author here. The particle filter has its own strengths and drawbacks. It makes more sense to use a particle filter in situations where the state search space is highly nonlocal and/or nonlinear, for example locating a drone on a map by matching radar features to topography.

If the process is linear and the estimation error is Gaussian (or approximately so in practice), the Kalman filter is known to be the optimal algorithm, and the particle filter would not only perform worse, but be more expensive to implement.

jiggawatts · on Dec 8, 2021

> more expensive to implement

Theoretically this is precisely true, but in practice the difference is negligible for many models given the incredible computer power of modern hardware. A particle filter with a million points is trivial to model on even a mobile phone GPU with milliwatt levels of power draw.

Even "simple" cases like blending GPS data with other sensors could potentially benefit from a particle filter model. For example, an advanced particle filter could model the GPS signal echoing off buildings, which might result in multiple local maxima in the location probability space.

Most (all?) current systems simply throw away the other peaks in the GPS signal and feed in only the one with the "highest likelyhood".

Similarly, wheel tick counters typically use one wheel as an input. A particle filter could take all four wheels and be able to model disagreements in a robust way.

Particle filters are also better where the probabilities are not Gaussian. E.g.: measurements that are strictly positive often don't have Gaussian PDFs.

tbabb · on Dec 8, 2021

> but in practice the difference is negligible [...] A particle filter with a million points is trivial

This will be O(millions * state_size) of flops per frame. A Kalman filter of the same state size will have the expense of a matrix invert, which will be O(state_size^3). So for a state size of, say, 12 floats, the Kalman will be about O(2000)-ish flops. A particle filter with "millions of points" modeling the same system will be O(state_size * millions * flops_per_state_update) which could be literally billions of flops, a six order of magnitude difference.

While there are absolutely applications where the particle filter is a more appropriate choice, it's just false to say that the performance difference is negligible. The difference is quite large.

jiggawatts · on Dec 8, 2021

You have an extra multiply there.

If you need, say, 50 ops per particle, and there's 1 million, and they're updated 10x per second, this is just 500 MFLOPS.

A typical recent-ish (not even latest-gen!) mobile GPU can handle something like 500 GFLOPS, more than 1000x the amount needed! https://news.ycombinator.com/item?id=16750535

You could even update the filter 1000x per second to track high-frequency motion and still only use 10% of the GPU power!

Obviously, some models need more ops per particle, more or less particles, or different update frequencies. However, consider that current-gen phones like the iPhone 13 are already at multiple TFLOPS, and this number will just keep going up over time.

There might be some really interesting algorithms unlocked once we're at the point where we can run particle filters "per pixel" on a video feed.

tbabb · on Dec 8, 2021

You're completely free to decide you're unbothered by it in your own particular application, but it would be unhelpfully misleading to other readers to say that a 10^6 performance difference is "negligible", or that they should expect the particle filter to give better results.

Particle filter is very cool, and it's great that someone mentioned it because it's quite useful in the right circumstances. But it is not a panacea (nor is the Kalman filter), and we don't need to misrepresent it in order to advocate for it in cases where it's applicable.

butwhywhyoh · on Dec 8, 2021

I didn't see you admit that a 6 order of magnitude difference isn't "negligible", as you originally claimed. Instead you're making the argument that we can throw hardware at the problem, which is fine, but a very different statement.

version_five · on Dec 7, 2021

I expect people to disagree with me, but it's very hard to get past the condescending language in the article:

> Totally neat, crazy correlations, scary math, pretty pictures, shiny pictures

It's really hard to read past this. I've seen this tone occasionally from junior academic lectures who somehow think that talking to people like they are nervous children will help put them at ease. I don't think the author's doing this malociouly, but it comes across as "I'm so smart, but don't be afraid, I'm dumbing it down to a cutesy level you can understand"

Avoid writing like this if you can, it's clear from the comments the article is otherwise very good

dang · on Dec 7, 2021

"Please don't pick the most provocative thing in an article and rush to the thread to complain about it. Find something interesting to comment about instead."

https://news.ycombinator.com/newsguidelines.html

beaconstudios · on Dec 7, 2021

it's just the pop-sci writing style. You see this with lots of articles and content directed at a general audience - I think this author's just reused the same style to talk to a technical audience. It's not inherently condescending, it just comes across that way if you're writing for an audience of technical peers.

this kind of breathless style makes more sense when you don't expect the audience to be technically literate and the content is there to make them say "wow, I don't understand anything you're saying but there certainly are technical words in here!" - see, for instance, popular media around anything related to quantum mechanics.

yesenadam · on Dec 7, 2021

Of course I disagree. Your comment, which advises people not to write like the article, starts with phrase it's a waste of time to read. Then it misquotes the article—I was, naturally, expecting to see that quote in the article. It repeats a phrase, apparently for effect. "Malociouly"? You seem to see signs of humanity in the writer only as condescension and vanity. Your "Avoid writing like this" applies infinitely better to your own comment, I think.

You seem to assume all writing should be dry academic writing, without vivid language or friendliness to the reader.

bigdict · on Dec 7, 2021

I agree. From personal experience, this sort of phrasing comes during a temporary high that you get after mastering a topic. Conquering a piece of technical material elevates you, and in a sense you want to be condescending towards your ignorant past self.

It's bad taste.

rightbyte · on Dec 7, 2021

[flagged]

minihat · on Dec 7, 2021

Dear HN: Please don't let this comment discourage you from reading the article.

Kalman filters are super useful tools across many domains.

They are a primary instrument in weather forecasting, robotics, finance, and more. Data you see every day have been touched by this technology.

ianai · on Dec 7, 2021

Thanks for your contextualization. The above comment really did hurt my desire to learn more about Kalman filters. I know just being negative or contrary to a thing has an unreasonably high return on investment in making a commenter look smart or authoritative, but it sure does harm.

vlovich123 · on Dec 7, 2021

To add onto this, it’s used in multiple places in GPS and other positioning problems (as in I was on the CoreLocation team at Apple years ago and Kalman filters were common). I’m not really sure where the commenter is sourcing their claim but my experience directly contradicts it.

rightbyte · on Dec 7, 2021

The article is very good and pedagogical.

My take on Kalman filter is that they are, with a diagonal regression matrix and precomputed parameters, just a convoluted notation for guesswork. It is abit like drawing Nyquist diagrams for system stability analysis - mostly an academic excercise. And stuff like that plagues control theory. I would rather that students learned to keep it simple.

WastingMyTime89 · on Dec 7, 2021

Kalman filters are literally everywhere in the industry. If there is a radar or data fusion involved, you can be pretty sure there are Kalman filters. I know a researcher whose most quoted article is just him applying fancy new methods to actual industrial datasets and showing they perform worse than a Kalman filter.

What you wrote is akin to someone explaining to students doing signal processing that they should stay away from Fourier transforms.

glial · on Dec 7, 2021

> just a convoluted notation for guesswork.

You could say that for all of estimation, by definition. But some estimates are better than others, and the KF is the best estimate under certain conditions...

...and one of those conditions is that you have a good estimate of the dynamics and measurement noise parameters. Rather than throw our hands up, we should just articulate this, and proceed to discuss methods for getting a good estimate of noise parameters, and discuss what happens if our estimates are wrong.

dhdc · on Dec 7, 2021

The actual power of Kalman filter lies in its ability to estimate states/variables that can't be measured directly, not just filtering time-series data.

With Kalman filter, you not only get your output filtered, but along with the state of the system at each step as well. No amount of low pass filtering can achieve that.

edge17 · on Dec 7, 2021

Going down the rabbit hole indeed... Kalman filters are Linear Quadratic Estimators (LQE) in control theory. Very rich field with very powerful techniques.

pvarangot · on Dec 7, 2021

> No amount of low pass filtering can achieve that.

Can't a simple IIR "low pass" do that for certain systems once they reach a steady state? I don't remember exactly when the KF becomes an IIR but intuitively for certain systems I kinda think the constants would stabilize after a while and then if you just use those you get to optimality if the system stabilizes.

rightbyte · on Dec 7, 2021

Ye sure, but you can have a state space representation without doing a Kalman filter.

wenc · on Dec 7, 2021

But you cannot estimate values of missing states (unmeasured) with just a state space representation even under conditions of observability. You need a state estimator for that, and the Kalman filter such an estimator.

Also, while it’s true the covariance matrix is initialized with diagonals (we often don’t have good priors), the values are being dynamically re-estimated from measurements at every iteration. The initial parameters are just that — initial.

If certain assumptions are true, ie Gaussian noise and LTI model is approximately correct, the covariance matrix converges to a stable covariance matrix that reflects the current reality. Some of those assumptions are relaxed with EKFs and UKFs but they’re essentially built on a Kalman framework (the competing one being MHE).

The Kalman filter is a state estimator, not just a filter. And it is used in industrial applications with advanced control systems (MPC).

rightbyte · on Dec 7, 2021

> Also, while it’s true the covariance matrix is initialized with diagonals (we often don’t have good priors), the values are being dynamically re-estimated from measurements at every iteration. The initial parameters are just that — initial.

The P covariance matrix does not depend on online input though, right? So the initial value of covariance matrix Q and R together with model H and F decides what P will end at. I mean, there is no online parameter estimation (dependent on input). Or am I getting it wrong?

I sure agree to that Kalman filter could be useful. But I don't like how they are presented.

Like the author of the blog writes: "At times its ability to extract accurate information seems almost magical"

wenc · on Dec 7, 2021

You’re correct. The Kalman gain in the linear case can be computed offline since it’s only a function of model parameters. The P matrix is updated at every iteration, but it is also only a function of Q and R (which is determined offline) and the previous P, which at time 0 is initialized with diagonals. P does represent the covariance of (xpred - xact) but it doesn’t take in inputs at every iteration. I appreciate that call out.

Like most things, in practical implementations there’s a bunch of extra things that go beyond the basic theory you have to do like reestimating the initial P every few iterations as well as reestimating Q and R with online data.

It’s not magic and yes it is a form of recursive linear regression but it does work in real life.

akumahh · on Dec 7, 2021

Yes the P covariance matrix does depend on online input, anyone who has ever used them would know so perhaps you have no idea what you are talking about.

If instead of making such posts you would like to learn more then: The P matrix is changed both through the model and the Kalman gain, where in both of these steps it depends on online input (model can depend on the state) and the Kalman gain depends on sensor measurements.

gugagore · on Dec 8, 2021

No, this is a common misunderstanding. If you have two noisy measurements of the same quantity, and you combine them, you take a weighted average of the two measurements to get a mean, but the variance of your estimate depends only on the variance of the measurements.

You would intuitively expect that if your two measurements are very far, that you should be less confident in the estimate. A Kalman filter does not do that.

What you can calculate is the likelihood of your data given the estimate, and that is going to be small if your two measurements are far.

The way you make it sound is that a KF would trust a measurement more or less depending on its value. That is simply not true. It does not, e.g., reject outliers.

akumahh · on Dec 8, 2021

I didn't mention anything about your "common misunderstanding" which I agree with and if you would want to do outlier detection or anything like that you would need to add it separately to your codebase (which is not that difficult to do)

My point was and still is that thinking about Kalman filters just as a measurement filtering system misses the point. An important value of measurements(especially in the more interesting formulations of KF such as the UKF) is updating the covariance matrices over time which affects the priori distributions of the state over time and thus affects your state estimation.

gugagore · on Dec 8, 2021

But it still sounds like you're saying the Kalman filter updates the state covariance matrix based on the measurement, which I'm taking to mean the nominal value of a measurement.

dhdc · on Dec 7, 2021

Okay, but its not very useful is it? Sure, you can propagate the state vectors directly, but even tiny amount of noise may be amplified and can make the results useless. Also, read up on why Euler Method is bad for solving ODEs numericaly.

This is exactly why we use Kalman filters.

LeftHandPlane · on Dec 7, 2021

Low pass filters can have significant phase delay, whereas Kalman filters can be made to have essentially no lag (phase delay). Additionally, Kalman filters can stabilize the estimate of a state based off of other correlated measurements much better than simply low pass filtering the measurement.

For example, in an IMU, the accelerometers may be noisy due to vibration from the aircraft or vehicle, and the magnetometer measurements which are not affected by vibration, can be used to stabilize the inclination estimates.

Kalman filters are extremely useful and enable applications not possible with just low pass filtering.

Also, you don't need to get the covariance matrix "exactly right", these are tuned in practice on actual measurement and can be used to speed up or slow down the state estimation or weighting of different measurements.

ska · on Dec 7, 2021

> Good luck getting the covariance matrix right in practice. Usually it is diagonal and essentially a simple low pass filter with a fancy name.

I think here you are confusing initialization of an estimator with the estimation process.

rightbyte · on Dec 7, 2021

I was referring to Q and R by "diagonal covariance matrix". Forgot P was also a covariance matrix.

But ye, control theory was over my head in university so ... I might be more frustrated with how simple and nice PI-controllers and first order IIR filters are used in industry while we were butting our heads bloody against the wall in university.

ska · on Dec 7, 2021

Industry uses a lot of things. Including Kalman filters, they are actually pretty common.

It's true that a lot of industry problems can be solved by linear regression, simple filters, Gaussian distributions, etc. Which can make you wonder why you bothered about all that stuff in your degree - but it's also true that there are a lot of interesting problems in industry that just aren't amenable to the most basic approaches.

It's also true that for the most part, bashing your head on this stuff in university improved your ability to think about the problem spaces.