[Math] How to estimate variances for Kalman filter from real sensor measurements without underestimating process noise.

bayesian networkerror-propagationestimation-theorykalman filterstatistics

As the title says, I want to estimate the variances needed for a Kalman filter from real sensor measurements only. For example we can take a temperature sensor, but the solution shall be as generalized as possible. Assume, we cannot accurately influence the sensor input (i.e. we cannot too accurately generate a fixed temperature) or do not want to do this.

I need to estimate two variances for the Kalman filter:

  • The measurement noise variance, $\sigma_m$, which is the inaccuracy introduced by the sensor itself (e.g. fluctuations in power supply or limited measurement resolution). It models the difference between sensor's measurement value and real temperature that is to be measured.
  • The process noise variance, $\sigma_p$, which estimates the error in our system model. It is dependent on the model, i.e. how exactly it estimates future values from the current state of the Kalman filter. For the temperature sensor, we could model the temperature to be static and we predict the next temperature to be the same as our latest temperature estimation. However, as the real temperature slowly changes over time the filter's prediction is incorrect. Basically, the process noise means: If we have an estimation $e_t\sim N(\mu_e,\sigma_e^2)$ of the real value at time $t$ and we apply our prediction model $f(x)$ which introduces a Gaussian distributed process noise with a variance $\sigma_p^2$, the new estimation will be $e_{t+1}\sim N( f(\mu_e), \sigma_e^2+\sigma_p^2))$.

How you can help…

I already did some work on this and came to some practical solution. However, there is still an issue that could lead to underestimation of the process noise. But, it would be advisable to only use overestimated values for the Kalman filter. In that case one can use the variance from the filter state to give reasonable information on the accuracy of the current estimation.

  1. If you have an idea how to resolve the underestimation problem (see explanations below), please let me know.
  2. I didn't study math. So please check my math below for any stupid mistakes 😉

Here is what I came up with so far…

Estimation of the measurement noise

  1. I model my sensor reading as: $m_t = x_t + e_t$ with $e_t \sim N(0,\sigma_m^2)$, where $m_t$ is the measured value, $x_t$ is the real value in absence of noise and $e_t$ is the measurement noise; whereas $t$ refers to the time. The noise is assumed to be Gaussian distributed without bias (zero mean).
  2. I assume that my signal is continuous such that: $\lim_{\delta\to0}(x_{t+\delta} – x_t) = 0$. This means that two real values that are very close in time are equal.
  3. Therefore,
    $\lim_{\delta\to0} m_{t+\delta}-m_t$
    $= \lim_{\delta\to0} ((x_{t+\delta} + e_{t+\delta}) – (x_t + e_t))$
    $= \lim_{\delta\to0} ((x_{t} + e_{t+\delta}) – (x_t + e_t))$
    $= \lim_{\delta\to0} ((x_{t+\delta} + e_t) – (x_t + e_t))$
    $= \lim_{\delta\to0} (e_{t+\delta} – e_t)$
    $= e_{t+\delta} – e_t$.
  4. Thus, $\lim_{\delta\to0} Var(m_{t+\delta}-m_t)$
    $= Var(e_{t+\delta}-e_t)$
    $= Var(e_{t+\delta}) + Var(e_t)$.
  5. We know that $e_t\sim N(0,\sigma_m^2)$ and $e_{t+\delta}\sim N(0,\sigma_m^2)$ therefore $Var(e_{t+\delta}) + Var(e_t) = \sigma_m^2 + \sigma_m^2 = 2\sigma_m^2$.
  6. Resulting in $\hat{\sigma_m^2} = \frac{1}{2}\lim_{\delta\to0} Var(m_{t+\delta}-m_t)$.

Hence, assuming we can take measurements very quickly (i.e. a magnitude quicker than the real signal values are usually changing as much as the the measurement noise) we could estimate the measurement noise by $\hat{\sigma_m^2} = \frac{1}{2}\lim_{\delta\to0} Var(m_{t+\delta}-m_t)$. Note, as $\delta$ will never be zero in a real scenario the estimator tends to overestimate the measurement noise, which will be important in the next step.

Estimation of the process noise

While the estimation of the measurement seems to be straightforward, I've some trouble with the process noise. As already mentioned, it is dependent on the estimation model.

The approach that I came up with is to use the following formula that I used to describe what the process noise is:

$e_{t+1}\sim N( f(\mu_e), \sigma_e^2+\sigma_p^2))$

We just take one measurement $\mu_m = x_t$ and $\sigma_m^2$ as the estimations for time step $t$, whereas we use the measurement noise $\sigma_m^2$ estimated as described above. Then we should get an estimation like this:

$e_{t+1}\sim N( f(x_t), \sigma_m^2+\sigma_p^2))$

We could then take a measurement $x_{t+1}$ at time $t+1$ which should fulfill this equations:

$x_{t+1} = f(x_t) + e_{t+1} \iff$
$e_{t+1} = x_{t+1} – f(x_t) \iff$
$Var(x_{t+1} – f(x_t)) = \sigma_m^2 + sigma_p^2 \iff$
$\hat{\sigma_p^2} = Var(x_{t+1} – f(x_t)) – \sigma_m^2$

But, if we use the overestimating estimator $\hat{\sigma_m^2}$ from the last step for $\sigma_m^2$ we will have an underestimating estimator for $\sigma_p^2$:

$\hat{\sigma_p^2} = Var(x_{t+1} – f(x_t)) – \hat{\sigma_m^2}$

The Problem: underestimated process noise

Meaning, whenever the estimation for the measurement noise $\hat{\sigma_m^2}$ is too high, the estimation $\hat{\sigma_p^2}$ for the process noise will be too low.
In other words: While the estimated mean value of the filter might be very accurate, the estimation of its accuracy will be too optimistic. So if the result would say "I know the temperature is 23.122… °C with a variance of 0.03232 K²" you could not rely on this, as the variance given would probably be too low (and in fact should be higher than this).

Is there any way to easily resolve this issue? How?

PS: Let me know if you noticed any incorrect formulas and/or assumptions.

Best Answer

I would suggest looking at "On the Identification of Variances and Adaptive Kalman Filtering" by R. Mehra as a starting point. This paper is highly cited and the methods are not difficult to implement.