Kalman filter: Understanding the derivation of the Covariance Matrix update

control theorykalman filteroptimal control

I am looking at some tutorials on deriving the Kalman Filter. The ideas make sense, except for one thing that I am unclear about–the update to the Covariance matrix. I was hoping someone could validate my intuition for that first term in the Covariance matrix update.

Let me setup the system. I will use the same notation as the tutorial, even though it is not the choice of symbols that I would pick. We are given an initial state $X_0, P_0$, which represent the initial state vector and Covariance matrix. The matrix $A$ is the transition matrix and $B$ is the matrix of controls. $Q_t$ is a disturbance term and $w_t$ is a noise term. The update or prediction represents the updated $X_t, P_t$ as the combination of the previous state of the system and a new measurement.

$$
X_t = AX_{t-1} + Bu_t + w_t \\
P_t = AP_{t-1}A^T + Q_t
$$

What is a little confusing is this $AP_{t-1}A^T$, which looks like what Gil Strang would call a Stiffness Matrix. My real question is why is this update to the convariance matrix formulated as this kind of Stiffness matrix. I always think of $A$ as a first order system of differential equation, it is not clear to me why this generates the Covariance matrix.

My intuition–and please correct me if I am wrong, is that if I just had $A^TA$, then that would give me the covariance matrix for the system states. I had never really thought of a covariance matrix like this before–I come from statistics and we usually think of covariance as correlations between predictors and not system states. So having the $P_{t-1}$ in the middle must in some way adjusts the $A^TA$ by the previous measurements on the covariance of the system. I am not quite clear on

Of course, the other strange thing is that usually in the Stiffness Matrix structure $A^TCA$, the inner matrix $C$ is usually diagonal right?

Best Answer

You can work it out from the definition, i.e.

$$\begin{align} P_t &= E[(x_t-\hat{x}_t)(x_t-\hat{x}_t)^T] \\ &= E[(A x_{t-1} + B u_t + w_t - A \hat{x}_{t-1} + B u_t)(A x_{t-1} + B u_t + w_t - A \hat{x}_{t-1} + B u_t)^T] \\ &= E[(A (x_{t-1} - \hat{x}_{t-1}) + w_t)(A (x_{t-1} - \hat{x}_{t-1}) + w_t)^T] \\ &= A E[(x_{t-1} - \hat{x}_{t-1}) (x_{t-1} - \hat{x}_{t-1})^T] A^T + E[w_t w_t^T] \\ &= A P_{t-1} A^T + Q_t \\ \end{align}$$

This of course uses the assumption that $x_t$ and $w_t$ are independent processes.