Couple of years ago I blogged about how it was developed for the Apollo program [1] and then I wrote an implementation of different variations of it in Go [2].
He walks you though all the way from the rationale, to a simple algebraic example, to the full blown Linear Algebraic Matrix solution. It's really worth watching even just the first couple videos to get a deeper understanding of this very useful technique.
it's not 55 videos (dunno if he forgot to upload them or misnumbered them...). Also most of it (I'd say 70%) is him doing matrix multiplications on a whiteboard. Which is excellent if you need a refresher in linear algebra (or just suck at it ;)), otherwise you can skip a lot of the content. That aside, he does an excellent job explaining it.
The best exercise I had in our sensing & estimation class was to derive the Kalman filter from 'scratch'. It's actually just the series of steps used to perform a minimization of a convex cost function, where the cost function is the inverse likelihood of the measurements given the data.
Someone once joked that all of sensing and estimation (and path planning and optimal control) is just applied convex minimization.
Where KFs finally clicked for me was in deriving the recursive form of good old fashioned linear regression. You can do linear regression a data point at a time, or a batch at a time, and while I can't remember if it works out to exactly the same thing (because of the recursive state), it works out to essentially the same. Dynamic bayesian networks are a superclass of KFs and they're a great thing to have in your tool kit if you work with data.
Seconded! I worked through this partway and appreciate some of the intuitions they build up.
It seems to me that there is no shortcuts to grokking the KF than going through simpler filters/Bayesian update algorithms.
Another resource I personally enjoy is Student Dave's lectures on YouTube. He's really entertaining and shows both pen-and-paper examples and Matlab simulations.
Excellent, thank you! The article did a good job telling me what a Kalman filter looks like in math, but I really wanted one that told me what it looks like in code.
The most thorough explanation of Kalman filter as recursive least square coupled with linear state-space models is
"Linear Estimation" by Thomas Kailath, Ali H. Sayed, Babak Hassibi.
It covers robust numerical algorithms such as square-root Kalman filter.
Unfortunately, it is a big book and reading it is going to be a big time investment, but I believe it will be a worthwhile investment. It develops in depth various linear algebra concepts and demonstrates profound application to linear least square. This book is what people mean when they say learn from masters.
It is also the only technical book I have read that aptly quotes Shakespeare:
"Age cannot whither her,
nor custom stale
her infinite variety."
I appreciate how this post gets resubmitted every couple of years. It's been a go-to bookmark for ages now and really does help give some visual intuition for what's happening.
The way I've always seen a Kalman filter is as a recursive least squares fitter, but instead of being run on the same input until convergence, it is run one step on every new sample of ever-changing inputs, in effect leading to a smoothing fitter.
For some reason I rarely see it being described as such.
That is an astute connection. The "recursive" part is that you do not need to keep all the history of observations and actions (often denoted y and u. in control theory). You can summarize the past with sufficient statistics (the mean and the variance for linear quadratic gaussian (LQG) problems).
You could imagine keeping all of the data and fitting a model to that. For LQG problems, you can use dynamic programming [1] to solve the problem faster.
You could alternatively imagine keeping a finite window of data and fitting a model to that. That filter would have a finite impulse response.
We also use it for fitting charged particle tracks in high energy physics experiments.
Here, we’re iteratively incorporating measurements along the trajectory, finally arriving at basically a least squares fit result, after a backward smoothing pass.
moving avg takes a bunch of samples to converge, while ASAP does it much faster, which seems to also be a key property of these Kalman filters as well?
Although not a Kalman filter, the "three cornered hat" thing sailors do to apply three distinct points of measurement to some reference and then using the triangle that forms as a limit to uncertainty is not dissimilar in intent and outcome.
Perhaps the main quality a kalman filter brings is the repeated application. If you imagine 20 sailors all doing the three cornered hat and another 20 doing a log line to assess speed, and a third doing some kind of ded reckoning.. a Kalman filter is what you apply over all of them to reduce it to your best most certain position and speed and heading?
I do wonder if false positives can drive in reverse to higher levels of uncertainty, not refinement. So a spurious echo on radar might contradict two other inputs.
The two overlapping coloured Gaussian clouds and the "aha" moment in the common region was good. It reinforces the idea of how two different measurements with their own uncertainty can combine to give a better certainty of the region of interest.
Were these not also used in analogue circuits to do feedback for missile tracking from radar/telemetry?
I struggle with algebras and formulae. The graphics help imbue some understanding.
Unfortunately for me, the intuitive part of Kalman filters is easy, but I don't have nearly the kind of grasp of Linear algebra I would need to implement one (I think).
Are there drop in, batteries included, ready to go kalman filter implementations/frameworks for common microcontrollers like arduino and raspberry pi 2040? Or is it infeasible to implement them in limited setups?
The computational requirements are very modest. The magic is in the math. I'm not sure if it counts as "batteries included" but I wrote a Kalman filter implementation in "no-std" (no standard library) rust called adskalman [1]. This means it can run on very modest bare metal targets with no operating system. Of course, it can also run on targets where the standard library is available and the examples [2] make use of this to do nice things like print the results which can be piped to a file and plotted. The core runs fine on embedded targets, and we use this with arm microcontrollers, but it should work on just about anything. Feedback is welcome.
I believe ROS (Robotic operating system) has good implementations of state estimation algorithms. If you are worried about memory footprint, then Durbin and Koopman ("Time Series Analysis by State Space Methods") has a scalar version of squared-root Kalman filter (it ingests one number at a time rather a whole vector at a time). You may have to implement it yourself though.
Totally feasible in microcontroller for small ones and modest update rates, yes. I have worked with robots that had them implemented in slow 8-bit microcontrollers. (I did not do the KF, though).
> Unfortunately for me, the intuitive part of Kalman filters is easy, but I don't have nearly the kind of grasp of Linear algebra I would need to implement one (I think).
This is me, anyone got a good linear algebra resource to recommend? Ideally one that addresses 'I took linear algebra ages ago in uni but it didn't really stick'.
I really like "Linear Algebra and Its Applications" by Gilbert Strang. It is not a textbook, the style is conversational, and he really tries to help you learn. It is one of rare math books that includes reason, context, application, and history without sacrificing rigor. The book also focuses on numerical algorithms aspects more than some popular textbooks, which may be helpful to understanding Kalman filter.
I think you still need a good grasp of linear algebra and difference equations to identify state variables and correctly set up the “model” or “plant” matrix, this is specific to the system so it can’t be provided by the framework. If you can do this, the rest of the Kalman filter is straightforward and can easily be done in numpy etc.
If you're willing to just believe that there's a derivation that it works, actually implementing a kalman filter is pretty straightforward. The proof is a pain, but you don't have to be able to derive it to use it.
Welford's online algorithm for computing weighted mean is essentially just Kalman filtering: Weights are the inverse of measurement covariance, i.e. the information, the initial state is the first measurement, state transition and observation model are 1x1 identity matrices.
I've been considering writing up something for extended kalman filters like this. Kalman filters are great but in my experience extremely limited without the non-linear extensions which make them useful in a much wider variety of problems.
Awesome article. Maths education needs way more visualisation in it like this.
I first found out about Kalman Filters when looking up how large ships use software to remain stationary in the same geographical location even whilst surrounded by crazy waves.
A related and much more powerful predictor in a noisy environment is the Particle Filter, which uses a mechanism that is particularly appealing to me as a programmer…
Author here. The particle filter has its own strengths and drawbacks. It makes more sense to use a particle filter in situations where the state search space is highly nonlocal and/or nonlinear, for example locating a drone on a map by matching radar features to topography.
If the process is linear and the estimation error is Gaussian (or approximately so in practice), the Kalman filter is known to be the optimal algorithm, and the particle filter would not only perform worse, but be more expensive to implement.
Theoretically this is precisely true, but in practice the difference is negligible for many models given the incredible computer power of modern hardware. A particle filter with a million points is trivial to model on even a mobile phone GPU with milliwatt levels of power draw.
Even "simple" cases like blending GPS data with other sensors could potentially benefit from a particle filter model. For example, an advanced particle filter could model the GPS signal echoing off buildings, which might result in multiple local maxima in the location probability space.
Most (all?) current systems simply throw away the other peaks in the GPS signal and feed in only the one with the "highest likelyhood".
Similarly, wheel tick counters typically use one wheel as an input. A particle filter could take all four wheels and be able to model disagreements in a robust way.
Particle filters are also better where the probabilities are not Gaussian. E.g.: measurements that are strictly positive often don't have Gaussian PDFs.
> but in practice the difference is negligible [...] A particle filter with a million points is trivial
This will be O(millions * state_size) of flops per frame. A Kalman filter of the same state size will have the expense of a matrix invert, which will be O(state_size^3). So for a state size of, say, 12 floats, the Kalman will be about O(2000)-ish flops. A particle filter with "millions of points" modeling the same system will be O(state_size * millions * flops_per_state_update) which could be literally billions of flops, a six order of magnitude difference.
While there are absolutely applications where the particle filter is a more appropriate choice, it's just false to say that the performance difference is negligible. The difference is quite large.
You could even update the filter 1000x per second to track high-frequency motion and still only use 10% of the GPU power!
Obviously, some models need more ops per particle, more or less particles, or different update frequencies. However, consider that current-gen phones like the iPhone 13 are already at multiple TFLOPS, and this number will just keep going up over time.
There might be some really interesting algorithms unlocked once we're at the point where we can run particle filters "per pixel" on a video feed.
You're completely free to decide you're unbothered by it in your own particular application, but it would be unhelpfully misleading to other readers to say that a 10^6 performance difference is "negligible", or that they should expect the particle filter to give better results.
Particle filter is very cool, and it's great that someone mentioned it because it's quite useful in the right circumstances. But it is not a panacea (nor is the Kalman filter), and we don't need to misrepresent it in order to advocate for it in cases where it's applicable.
I didn't see you admit that a 6 order of magnitude difference isn't "negligible", as you originally claimed. Instead you're making the argument that we can throw hardware at the problem, which is fine, but a very different statement.
It's really hard to read past this. I've seen this tone occasionally from junior academic lectures who somehow think that talking to people like they are nervous children will help put them at ease. I don't think the author's doing this malociouly, but it comes across as "I'm so smart, but don't be afraid, I'm dumbing it down to a cutesy level you can understand"
Avoid writing like this if you can, it's clear from the comments the article is otherwise very good
"Please don't pick the most provocative thing in an article and rush to the thread to complain about it. Find something interesting to comment about instead."
it's just the pop-sci writing style. You see this with lots of articles and content directed at a general audience - I think this author's just reused the same style to talk to a technical audience. It's not inherently condescending, it just comes across that way if you're writing for an audience of technical peers.
this kind of breathless style makes more sense when you don't expect the audience to be technically literate and the content is there to make them say "wow, I don't understand anything you're saying but there certainly are technical words in here!" - see, for instance, popular media around anything related to quantum mechanics.
Of course I disagree. Your comment, which advises people not to write like the article, starts with phrase it's a waste of time to read. Then it misquotes the article—I was, naturally, expecting to see that quote in the article. It repeats a phrase, apparently for effect. "Malociouly"? You seem to see signs of humanity in the writer only as condescension and vanity. Your "Avoid writing like this" applies infinitely better to your own comment, I think.
You seem to assume all writing should be dry academic writing, without vivid language or friendliness to the reader.
I agree. From personal experience, this sort of phrasing comes during a temporary high that you get after mastering a topic. Conquering a piece of technical material elevates you, and in a sense you want to be condescending towards your ignorant past self.
Thanks for your contextualization. The above comment really did hurt my desire to learn more about Kalman filters. I know just being negative or contrary to a thing has an unreasonably high return on investment in making a commenter look smart or authoritative, but it sure does harm.
To add onto this, it’s used in multiple places in GPS and other positioning problems (as in I was on the CoreLocation team at Apple years ago and Kalman filters were common). I’m not really sure where the commenter is sourcing their claim but my experience directly contradicts it.
My take on Kalman filter is that they are, with a diagonal regression matrix and precomputed parameters, just a convoluted notation for guesswork. It is abit like drawing Nyquist diagrams for system stability analysis - mostly an academic excercise. And stuff like that plagues control theory. I would rather that students learned to keep it simple.
Kalman filters are literally everywhere in the industry. If there is a radar or data fusion involved, you can be pretty sure there are Kalman filters. I know a researcher whose most quoted article is just him applying fancy new methods to actual industrial datasets and showing they perform worse than a Kalman filter.
What you wrote is akin to someone explaining to students doing signal processing that they should stay away from Fourier transforms.
You could say that for all of estimation, by definition. But some estimates are better than others, and the KF is the best estimate under certain conditions...
...and one of those conditions is that you have a good estimate of the dynamics and measurement noise parameters. Rather than throw our hands up, we should just articulate this, and proceed to discuss methods for getting a good estimate of noise parameters, and discuss what happens if our estimates are wrong.
The actual power of Kalman filter lies in its ability to estimate states/variables that can't be measured directly, not just filtering time-series data.
With Kalman filter, you not only get your output filtered, but along with the state of the system at each step as well. No amount of low pass filtering can achieve that.
Going down the rabbit hole indeed... Kalman filters are Linear Quadratic Estimators (LQE) in control theory. Very rich field with very powerful techniques.
> No amount of low pass filtering can achieve that.
Can't a simple IIR "low pass" do that for certain systems once they reach a steady state? I don't remember exactly when the KF becomes an IIR but intuitively for certain systems I kinda think the constants would stabilize after a while and then if you just use those you get to optimality if the system stabilizes.
But you cannot estimate values of missing states (unmeasured) with just a state space representation even under conditions of observability. You need a state estimator for that, and the Kalman filter such an estimator.
Also, while it’s true the covariance matrix is initialized with diagonals (we often don’t have good priors), the values are being dynamically re-estimated from measurements at every iteration. The initial parameters are just that — initial.
If certain assumptions are true, ie Gaussian noise and LTI model is approximately correct, the covariance matrix converges to a stable covariance matrix that reflects the current reality. Some of those assumptions are relaxed with EKFs and UKFs but they’re essentially built on a Kalman framework (the competing one being MHE).
The Kalman filter is a state estimator, not just a filter. And it is used in industrial applications with advanced control systems (MPC).
> Also, while it’s true the covariance matrix is initialized with diagonals (we often don’t have good priors), the values are being dynamically re-estimated from measurements at every iteration. The initial parameters are just that — initial.
The P covariance matrix does not depend on online input though, right? So the initial value of covariance matrix Q and R together with model H and F decides what P will end at. I mean, there is no online parameter estimation (dependent on input). Or am I getting it wrong?
I sure agree to that Kalman filter could be useful. But I don't like how they are presented.
Like the author of the blog writes:
"At times its ability to extract accurate information seems almost magical"
You’re correct. The Kalman gain in the linear case can be computed offline since it’s only a function of model parameters. The P matrix is updated at every iteration, but it is also only a function of Q and R (which is determined offline) and the previous P, which at time 0 is initialized with diagonals. P does represent the covariance of (xpred - xact) but it doesn’t take in inputs at every iteration. I appreciate that call out.
Like most things, in practical implementations there’s a bunch of extra things that go beyond the basic theory you have to do like reestimating the initial P every few iterations as well as reestimating Q and R with online data.
It’s not magic and yes it is a form of recursive linear regression but it does work in real life.
Yes the P covariance matrix does depend on online input, anyone who has ever used them would know so perhaps you have no idea what you are talking about.
If instead of making such posts you would like to learn more then:
The P matrix is changed both through the model and the Kalman gain, where in both of these steps it depends on online input (model can depend on the state) and the Kalman gain depends on sensor measurements.
No, this is a common misunderstanding. If you have two noisy measurements of the same quantity, and you combine them, you take a weighted average of the two measurements to get a mean, but the variance of your estimate depends only on the variance of the measurements.
You would intuitively expect that if your two measurements are very far, that you should be less confident in the estimate. A Kalman filter does not do that.
What you can calculate is the likelihood of your data given the estimate, and that is going to be small if your two measurements are far.
The way you make it sound is that a KF would trust a measurement more or less depending on its value. That is simply not true. It does not, e.g., reject outliers.
I didn't mention anything about your "common misunderstanding" which I agree with and if you would want to do outlier detection or anything like that you would need to add it separately to your codebase (which is not that difficult to do)
My point was and still is that thinking about Kalman filters just as a measurement filtering system misses the point. An important value of measurements(especially in the more interesting formulations of KF such as the UKF) is updating the covariance matrices over time which affects the priori distributions of the state over time and thus affects your state estimation.
But it still sounds like you're saying the Kalman filter updates the state covariance matrix based on the measurement, which I'm taking to mean the nominal value of a measurement.
Okay, but its not very useful is it? Sure, you can propagate the state vectors directly, but even tiny amount of noise may be amplified and can make the results useless. Also, read up on why Euler Method is bad for solving ODEs numericaly.
Low pass filters can have significant phase delay, whereas Kalman filters can be made to have essentially no lag (phase delay). Additionally, Kalman filters can stabilize the estimate of a state based off of other correlated measurements much better than simply low pass filtering the measurement.
For example, in an IMU, the accelerometers may be noisy due to vibration from the aircraft or vehicle, and the magnetometer measurements which are not affected by vibration, can be used to stabilize the inclination estimates.
Kalman filters are extremely useful and enable applications not possible with just low pass filtering.
Also, you don't need to get the covariance matrix "exactly right", these are tuned in practice on actual measurement and can be used to speed up or slow down the state estimation or weighting of different measurements.
I was referring to Q and R by "diagonal covariance matrix". Forgot P was also a covariance matrix.
But ye, control theory was over my head in university so ... I might be more frustrated with how simple and nice PI-controllers and first order IIR filters are used in industry while we were butting our heads bloody against the wall in university.
Industry uses a lot of things. Including Kalman filters, they are actually pretty common.
It's true that a lot of industry problems can be solved by linear regression, simple filters, Gaussian distributions, etc. Which can make you wonder why you bothered about all that stuff in your degree - but it's also true that there are a lot of interesting problems in industry that just aren't amenable to the most basic approaches.
It's also true that for the most part, bashing your head on this stuff in university improved your ability to think about the problem spaces.
[1] https://cybernetist.com/2019/01/13/apollo-kalman-filter-and-...
[2] https://github.com/milosgajdos/go-estimate