I recently updated the homepage of my Kalman Filter tutorial with a new example based on a simple radar tracking problem. The goal was to make the Kalman Filter understandable to anyone with basic knowledge of statistics and linear algebra, without requiring advanced mathematics.
The example starts with a radar measuring the distance to a moving object and gradually builds intuition around noisy measurements, prediction using a motion model, and how the Kalman Filter combines both. I also tried to keep the math minimal while still showing where the equations come from.
I would really appreciate feedback on clarity. Which parts are intuitive? Which parts are confusing? Is the math level appropriate?
If you have used Kalman Filters in practice, I would also be interested to hear whether this explanation aligns with your intuition.
I just glossed through for now so might have missed it, but it seemed you pulled the process noise matrix Q out of a hat. I guess it's explained properly in the book but would be nice with some justification for why the entries are what they are.
To keep the example focused and reasonably short, I treated Q matrix as given and concentrated on building intuition around prediction and update. But you're right that this can feel like it appears out of nowhere.
The derivation of the Q matrix is a separate topic and requires additional assumptions about the motion model and noise characteristics, which would have made the example significantly longer. I cover this topic in detail in the book.
I'll consider adding a brief explanation or reference to make that step clearer. Thanks for pointing this out.
Yeah I understand. I do think a brief explanation would help a lot though. As it sits it's not even entirely clear if the presented matrix is general or highly specific. I can easily see someone just use that as their Q matrix because that's what the Q matrix is, says so right there.
I think that this was a great intro into Kalman filtering.
The one important point that I think warrants a small paragraph near the end is that the example you gave is a way of doing forecasting (estimating the future state) and nowcasting (estimating the current state), but Kalman filters can also be used retrospectively to do retrocasting (using the present data to get a better estimate of the past).
Nowcasting and retrocasting are concepts that a lot of people have trouble with. That trouble is the crux of the Kalman filter ... combining (noisy) measurements with (noisy) dead reckoning gives us (better) knowledge. For complete symmetry, it is important to point out that we can't just use old measurements to describe the past any more than we should only use current and past measurements to define our estimate of the present.
Firstly I think the clarity in general is good. The one piece I think you could do with explaining early on is which pieces of what you are describing are the model of the system and which pieces are the Kalman filter. I was following along as you built the markov model of the state matrix etc and then you called those equations the Kalman filter, but I didn't think we had built a Kalman filter yet.
Your early explanation of the filter (as a method for estimating the state of a system under uncertainty) was great but (unless I missed it) when you introduced the equations I wasn't clear that was the filter. I hope that makes sense.
You’re pointing out a real conceptual issue: where the system model ends and where the Kalman filter begins.
In Kalman filter theory there are two different components:
- The system model
- The Kalman filter (the algorithm)
The state transition and measurement equations belong to the system model. They describe the physics of the system and can vary from one application to another.
The Kalman filter is the algorithm that uses this model to estimate the current state and predict the future state.
I'll consider making that distinction more explicit when introducing the equations. Thanks for pointing this out.
The tutorial actually predates ChatGPT by quite a few years (first published in 2017). Today, I do sometimes use ChatGPT to fix grammar, but I am responsible for the content and it is always mine.
You lead with "Moreover, it is an optimal algorithm that minimizes state estimation uncertainty." By the end of the tutorial I understood what this meant, but "optimal algorithm" is a vague term I am unfamiliar with (despite using Kalman Filters in my work). It might help to expand on the term briefly before diving into the math, since IIUC it's the key characteristic of the method.
That's a good point. "Optimal" in this context means that, under the standard assumptions (linear system, Gaussian noise, correct model), the Kalman Filter minimizes the estimation error covariance. In other words, it provides the minimum-variance estimate among all linear unbiased estimators.
You're right that the term can feel vague without that context. I’ll consider adding a short clarification earlier in the introduction to make this clearer before diving into the math. Thanks for the suggestion.
I recently (~6 mo ago) made it a goal to understand and implement a useful Kalman filter, but I realized that they are very tightly coupled to their domain and application. I got about half as far as I wanted, and took a pause. I expect your work here will get me to the finish line, so I am psyched! Thank you!
I read and enjoyed your book a few months ago when a friend recommened it to me. I've been interested in control theory for a few years, but I'm still definitely a beginner when it comes to designing good control systems and have never done it professionally.
I've been in the process of writing a tutorial on how PID filters work for a much younger audience. As a result, I've been looking back at the original tutorials that made stuff click for me. I had several engineers try to explain PID control to me over the course of about a year, but I don't think I really got it until I ended up watching Terry Davis (yeah, the TempleOS guy) show off how to use PID control in SimStructure using a hovering rocket as an example.
The way he built the concept up was to take each component and build on the control system until he had something that worked. He started off with a simple proportional controller that ended up having a steady state error with the rocket hovering beneath the target height. Once he had that and pointed out the steady state error, he implemented the integral term showed off how it resulted in overshoot. Once that was working, he implemented the derivative control to back the overshoot off until he had something that settled pretty quickly.
I'm not sure how you could do something similar for a Kalman Filter, but I did find it genuinely constructive to see the thought process behind adding each component of the equation.
1. understand weighted least squares and how you can update an initial estimate (prior mean and variance) with a new measurement and its uncertainty (i.e. inverse variance weighted least squares)
2. this works because the true mean hasn't changed between measurements. What if it did?
3. KF uses a model of how the mean changes to predict what it should be now based on the past, including an inflation factor on the uncertainty since predictions aren't perfect
4. after the prediction, it becomes the same problem as (1) except you use the predicted values as the initial estimate
There are some details about the measurement matrix (when your measurement is a linear combination of the true value -- the state) and the Kalman gain, but these all come from the least squares formulation.
Least squares is the key and you can prove it's optimal under certain assumptions (e.g. Bayesian MMSE).
Kalman filters are very cool, but when applying them you've got to know that they're not magic. I struggled to apply Kalman Filters for a toy project about ten years ago, because the thing I didn't internalize is that Kalman filters excel at offsetting low-quality data by sampling at a higher rate. You can "retroactively" apply a Kalman filter to a dataset and see some improvement, but you'll only get amazing results if you sample your very-noisy data at a much higher rate than if you were sampling at a "good enough" rate. The higher your sample rate, the better your results will be. In that way, a Kalman filter is something you want to design around, not a "fix all" for data you already have.
Spending few weeks trying to understand Kalman filterm, I figured out that I need to understand all if the following:
1. Model of system
2. Internal state
3. How is optimal estimation defined
4. Covariance (statistics)
Kalman filter is optimal estimation of internal state and covariance of system based on measurements so far.
Kalman process/filter is mathematical solution to this problem as the system is evolving based on input and observable measurements. Turns out that internal state that includes both estimated value and covariance is all that is needed to fully capture internal state for such model.
It is important to undrstand, that having different model for what is optimum, uncertenty or system model, compared to what Rudolf Kalman presented, gives just different mathematical solution for this problem. Examples of different optimal solutions for different estimation models are nonlinear Kalman filters and Wiener filter.
---
I think that book on this topic from author Alex Becker is great and possibly best introduction into this topic. It has lot of examples and builds requred intuition really well. All I was missing is little more emphasis into mathematical rigor and chapter about LQG regulator, but you can find both of this in original paper by Rudolf Kalman.
When learning the Kalman filter, it clicks in place much faster when there are two or more inputs with different noise profiles. That's why it exists and that's what was its original use-case.
Yet virtually all tutorials stick to single-input examples, which is really an edge case. This site is no exception.
Kalman filters are great! For people interested of one used in practice, it's used by Sendspin to keep speakers in sync, even works in browsers on phones on 5G etc.
I never really understood Kalman filters, but there was a time I knew how to design non-optimal state (Luenberger?) observers, which are a lot easier to design and implement. I wonder if discussing those first would make things easier for the audience.
I recently built a drone from the ground up - learned how to build PCB's with the ESP32, wrote all of the flight firmware, etc. and built a controller iOS app.
Extended Kalman Filters are even more interesting because they let you do sensor fusion and such
I have worked with Kalman Filters for years, and gave this quick read. I saw the comments on Process Noise, so I focus there for now. I might get back to other sections tomorrow.
My simple head space (as I was taught and re-learned thru experience, and have passed on)
1. Kalman Gain close to 1 or 0 is a warning sign that careful consideration is needed.
This fact can be brought up immediately in example #5 and continued
2a. K close to 1.0 can be bad because..., however for some applications (dynamic models) it can be acceptable since...
2b. K close to 0.0 can be bad because... however for some applications (dynamic models) it can be acceptable since...
3. To solve the problem from step 2, As a first step, for those applications where K close to zero or one is bad... a fudge factor term (called Q for reasons discussed later) can be added to the Kalman Gain computation
3a. Choosing the correct fudge factor for the application is often very difficult and may require lots of simulation runs (a parameter study) with different measurement sequences (including some expected off-nominals) and various values for the process noise.
Remember we are designing a filter, likely for a new application (or a non-trivial extension of an existing application)... so all the elements of an engineering design are needed. Make solution hypothesis, test them, refine them, test them some more with greater realism and eventually real-world data, continue to refine the solution.
4. For easy case of a simple application and only a few unknown states, the process noise can be guesstimated from experience. For more complex applications (perhaps there are dozens of unknown states to estimate) a more rigorous approach to select the correct mathematical description of Process Noise is needed.
-- End of Fudge Factor discussion --
{I think you covered this section well} Then you can introduce the notion that the state dynamics cannot model everything and that unmodeled part can be approximated by Process Noise. For example an unmodeled constant acceleration, gives a process noise of ....
Here are some sentences I think are wrong or misleading
"As you can see, the Kalman Gain gradually decreases; therefore, the KF converges." However, the Kalman Filter may converge to garbage. This garbage could be a "lag", or just plain wrong.
"The process noise produces estimation errors."
A well chosen process noise is important to reduce estimation errors over an ensemble of conditions, by accommodating a range of unmodeled state dynamics. A poorly chosen process may not improve anything.
I have worked with Kalman Filters for years, and gave this quick read. I saw the comments on Process Noise, so I focus there for now. I might get back to other sections tomorrow.
My simple head space (as I was taught and re-learned thru experience, and have passed on)
1. Kalman Gain close to 1 or 0 is a warning sign that careful consideration is needed.
This fact can be brought up immediately in example #5 and continued
2a. K close to 1.0 can be bad because..., however for some applications (dynamic models) it can be acceptable since...
2b. K close to 0.0 can be bad because... however for some applications (dynamic models) it can be acceptable since...
3. To solve the problem from step 2, As a first step, for those applications where K close to zero or one is bad... a fudge factor term (called Q for reasons discussed later) can be added to the Kalman Gain computation
3a. Choosing the correct fudge factor for the application is often very difficult and may require lots of simulation runs (a parameter study) with different measurement sequences (including some expected off-nominals) and various values for the process noise.
Remember we are designing a filter, likely for a new application (or a non-trivial extension of an existing application)... so all the elements of an engineering design are needed. Make solution hypothesis, test them, refine them, test them some more with greater realism and eventually real-world data, continue to refine the solution.
4. For easy case of a simple application and only a few unknown states, the process noise can be guesstimated from experience. For more complex applications (perhaps there are dozens of unknown states to estimate) a more rigorous approach to select the correct mathematical description of Process Noise is needed.
-- End of Fudge Factor discussion --
5. Here you can introduce the notion that the state dynamics cannot model everything and that unmodeled part can be approximated by Process Noise. For example an unmodeled constant acceleration, gives dt^4
Here are some sentences I think are wrong or misleading
"As you can see, the Kalman Gain gradually decreases; therefore, the KF converges." However, the Kalman Filter may converge to garbage. This garbage could be a "lag", or just plain wrong.
"The process noise produces estimation errors."
A well chosen process noise is important to reduce estimation errors over an ensemble of conditions, by accommodating a range of unmodeled state dynamics. A poorly chosen process may not improve anything.
66 comments
I recently updated the homepage of my Kalman Filter tutorial with a new example based on a simple radar tracking problem. The goal was to make the Kalman Filter understandable to anyone with basic knowledge of statistics and linear algebra, without requiring advanced mathematics.
The example starts with a radar measuring the distance to a moving object and gradually builds intuition around noisy measurements, prediction using a motion model, and how the Kalman Filter combines both. I also tried to keep the math minimal while still showing where the equations come from.
I would really appreciate feedback on clarity. Which parts are intuitive? Which parts are confusing? Is the math level appropriate?
If you have used Kalman Filters in practice, I would also be interested to hear whether this explanation aligns with your intuition.
The derivation of the Q matrix is a separate topic and requires additional assumptions about the motion model and noise characteristics, which would have made the example significantly longer. I cover this topic in detail in the book.
I'll consider adding a brief explanation or reference to make that step clearer. Thanks for pointing this out.
The one important point that I think warrants a small paragraph near the end is that the example you gave is a way of doing forecasting (estimating the future state) and nowcasting (estimating the current state), but Kalman filters can also be used retrospectively to do retrocasting (using the present data to get a better estimate of the past).
Nowcasting and retrocasting are concepts that a lot of people have trouble with. That trouble is the crux of the Kalman filter ... combining (noisy) measurements with (noisy) dead reckoning gives us (better) knowledge. For complete symmetry, it is important to point out that we can't just use old measurements to describe the past any more than we should only use current and past measurements to define our estimate of the present.
Your early explanation of the filter (as a method for estimating the state of a system under uncertainty) was great but (unless I missed it) when you introduced the equations I wasn't clear that was the filter. I hope that makes sense.
In Kalman filter theory there are two different components:
- The system model
- The Kalman filter (the algorithm)
The state transition and measurement equations belong to the system model. They describe the physics of the system and can vary from one application to another.
The Kalman filter is the algorithm that uses this model to estimate the current state and predict the future state.
I'll consider making that distinction more explicit when introducing the equations. Thanks for pointing this out.
> Don't post generated comments or AI-edited comments. HN is for conversation between humans.
You're right that the term can feel vague without that context. I’ll consider adding a short clarification earlier in the introduction to make this clearer before diving into the math. Thanks for the suggestion.
I've been in the process of writing a tutorial on how PID filters work for a much younger audience. As a result, I've been looking back at the original tutorials that made stuff click for me. I had several engineers try to explain PID control to me over the course of about a year, but I don't think I really got it until I ended up watching Terry Davis (yeah, the TempleOS guy) show off how to use PID control in SimStructure using a hovering rocket as an example.
The way he built the concept up was to take each component and build on the control system until he had something that worked. He started off with a simple proportional controller that ended up having a steady state error with the rocket hovering beneath the target height. Once he had that and pointed out the steady state error, he implemented the integral term showed off how it resulted in overshoot. Once that was working, he implemented the derivative control to back the overshoot off until he had something that settled pretty quickly.
I'm not sure how you could do something similar for a Kalman Filter, but I did find it genuinely constructive to see the thought process behind adding each component of the equation.
1. understand weighted least squares and how you can update an initial estimate (prior mean and variance) with a new measurement and its uncertainty (i.e. inverse variance weighted least squares)
2. this works because the true mean hasn't changed between measurements. What if it did?
3. KF uses a model of how the mean changes to predict what it should be now based on the past, including an inflation factor on the uncertainty since predictions aren't perfect
4. after the prediction, it becomes the same problem as (1) except you use the predicted values as the initial estimate
There are some details about the measurement matrix (when your measurement is a linear combination of the true value -- the state) and the Kalman gain, but these all come from the least squares formulation.
Least squares is the key and you can prove it's optimal under certain assumptions (e.g. Bayesian MMSE).
1. Model of system
2. Internal state
3. How is optimal estimation defined
4. Covariance (statistics)
Kalman filter is optimal estimation of internal state and covariance of system based on measurements so far.
Kalman process/filter is mathematical solution to this problem as the system is evolving based on input and observable measurements. Turns out that internal state that includes both estimated value and covariance is all that is needed to fully capture internal state for such model.
It is important to undrstand, that having different model for what is optimum, uncertenty or system model, compared to what Rudolf Kalman presented, gives just different mathematical solution for this problem. Examples of different optimal solutions for different estimation models are nonlinear Kalman filters and Wiener filter.
---
I think that book on this topic from author Alex Becker is great and possibly best introduction into this topic. It has lot of examples and builds requred intuition really well. All I was missing is little more emphasis into mathematical rigor and chapter about LQG regulator, but you can find both of this in original paper by Rudolf Kalman.
Yet virtually all tutorials stick to single-input examples, which is really an edge case. This site is no exception.
Open the Sendspin live demo in your browser: https://www.sendspin-audio.com/#live-demo
Some more info on Kalman implementation here https://github.com/Sendspin/time-filter/blob/main/docs%2Fthe...
See for example: https://rlabbe.github.io/Kalman-and-Bayesian-Filters-in-Pyth...
Is there something in this particular resource that makes it worth buying?
Extended Kalman Filters are even more interesting because they let you do sensor fusion and such
My simple head space (as I was taught and re-learned thru experience, and have passed on)
1. Kalman Gain close to 1 or 0 is a warning sign that careful consideration is needed.
This fact can be brought up immediately in example #5 and continued
2a. K close to 1.0 can be bad because..., however for some applications (dynamic models) it can be acceptable since...
2b. K close to 0.0 can be bad because... however for some applications (dynamic models) it can be acceptable since...
3. To solve the problem from step 2, As a first step, for those applications where K close to zero or one is bad... a fudge factor term (called Q for reasons discussed later) can be added to the Kalman Gain computation
3a. Choosing the correct fudge factor for the application is often very difficult and may require lots of simulation runs (a parameter study) with different measurement sequences (including some expected off-nominals) and various values for the process noise.
Remember we are designing a filter, likely for a new application (or a non-trivial extension of an existing application)... so all the elements of an engineering design are needed. Make solution hypothesis, test them, refine them, test them some more with greater realism and eventually real-world data, continue to refine the solution.
4. For easy case of a simple application and only a few unknown states, the process noise can be guesstimated from experience. For more complex applications (perhaps there are dozens of unknown states to estimate) a more rigorous approach to select the correct mathematical description of Process Noise is needed.
-- End of Fudge Factor discussion --
{I think you covered this section well} Then you can introduce the notion that the state dynamics cannot model everything and that unmodeled part can be approximated by Process Noise. For example an unmodeled constant acceleration, gives a process noise of ....
Here are some sentences I think are wrong or misleading
"As you can see, the Kalman Gain gradually decreases; therefore, the KF converges." However, the Kalman Filter may converge to garbage. This garbage could be a "lag", or just plain wrong.
"The process noise produces estimation errors." A well chosen process noise is important to reduce estimation errors over an ensemble of conditions, by accommodating a range of unmodeled state dynamics. A poorly chosen process may not improve anything.
My simple head space (as I was taught and re-learned thru experience, and have passed on)
1. Kalman Gain close to 1 or 0 is a warning sign that careful consideration is needed.
This fact can be brought up immediately in example #5 and continued
2a. K close to 1.0 can be bad because..., however for some applications (dynamic models) it can be acceptable since...
2b. K close to 0.0 can be bad because... however for some applications (dynamic models) it can be acceptable since...
3. To solve the problem from step 2, As a first step, for those applications where K close to zero or one is bad... a fudge factor term (called Q for reasons discussed later) can be added to the Kalman Gain computation
3a. Choosing the correct fudge factor for the application is often very difficult and may require lots of simulation runs (a parameter study) with different measurement sequences (including some expected off-nominals) and various values for the process noise.
Remember we are designing a filter, likely for a new application (or a non-trivial extension of an existing application)... so all the elements of an engineering design are needed. Make solution hypothesis, test them, refine them, test them some more with greater realism and eventually real-world data, continue to refine the solution.
4. For easy case of a simple application and only a few unknown states, the process noise can be guesstimated from experience. For more complex applications (perhaps there are dozens of unknown states to estimate) a more rigorous approach to select the correct mathematical description of Process Noise is needed.
-- End of Fudge Factor discussion --
5. Here you can introduce the notion that the state dynamics cannot model everything and that unmodeled part can be approximated by Process Noise. For example an unmodeled constant acceleration, gives dt^4
Here are some sentences I think are wrong or misleading
"As you can see, the Kalman Gain gradually decreases; therefore, the KF converges." However, the Kalman Filter may converge to garbage. This garbage could be a "lag", or just plain wrong.
"The process noise produces estimation errors." A well chosen process noise is important to reduce estimation errors over an ensemble of conditions, by accommodating a range of unmodeled state dynamics. A poorly chosen process may not improve anything.