Derivation of Kalman Gain for the Unscented Kalman Filter (UKF)

kalman filter

I recently went through the mathematical derivations of the Kalman filter (KF), the extended Kalman filter (EKF) and the Unscented Kalman filter (UKF). My question is concerned with some detail concerning the derivation of the UKF.

While there is a lot of literature available for the unscented transform (necessary for estimating the mean and the covariance matrix for both the prediction step as well as for the update step), I did not found anything about the derivation of the empirical Kalman gain. This gain is given by $K_t = \Sigma^{x,z}_t \Sigma_t^{-1}$. Here, $x$ represents a state and $z$ represents an observation. The UKF uses a certain set of states called sigma points which are predicted (for the next time step) using a non-linear system model $g(x)$. Observations can be generated from this transformed sigma point set by applying a non-linear observation model $h(x)$. The term $\Sigma^{x,z}_t$ is an empirical cross-correlation matrix of the predicted sigma points and the observations generated from these predicted sigma points. The term $\Sigma_t^{-1}$ is the inverse of an empirical covariance matrix of the observations generated.

I have no idea of how to derive the formula for the empirical Kalman gain $K_t$ described above. Is it possible to show that this Kalman gain is an empirical version of the Kalman gain used for the EKF or the KF? How does this work? Is there a paper or tutorial available providing a derivation of the empirical Kalman gain? Is there may be at least an intuitive explanation why the cross-correlation matrix carries significant information with respect to the Kalman gain?

Best Answer

Following the symbols on Wikipedia, the Kalman gain is $K_k = P_{k|k-1}H^T_k S_{k-1}^{-1}$

$S_{k-1}^{-1}$ is equivalent to what you have called $\Sigma_t^{-1}$ in your question, so it suffices to show the equivalence of $P_{k|k-1}H_k^T$ and $\Sigma^{x,t}_z$.

Cross covariance between the predicted state $\hat{x}_{k|k-1}$ and the predicted measurement $H_k \hat{x}_{k|k-1} + v_k$ is (which you have called $\Sigma_{z}^{x,t}$) is defined $E[(\hat{x}_{k|k-1} - E[\hat{x}_{k|k-1}])(H_k\hat{x}_{k|k-1} + v_k - E[H_k\hat{x}_{k|k-1}+v_k])^T]$.

$$E[(\hat{x}_{k|k-1} - E[\hat{x}_{k|k-1}])(H_k\hat{x}_{k|k-1} + v_k - E[H_k\hat{x}_{k|k-1}+v_k])^T]$$ Now use linearity of expectation to factor out $H_k$. $$= E[(\hat{x}_{k|k-1} - E[\hat{x}_{k|k-1}])(H_k(\hat{x}_{k|k-1}- E[\hat{x}_{k|k-1}])+v_k - E[v_k])^T]. $$

Define $\Delta \hat{x}_{k|k-1} \equiv \hat{x}_{k|k-1} - E[\hat{x}_{k|k-1}]$ and $\Delta v_k \equiv v_k - E[v_k]$. $$= E[\Delta \hat{x}_{k|k-1}(H_k\Delta \hat{x}_{k|k-1}+\Delta v_k)^T]. $$ Distribute the transpose. $$= E[\Delta \hat{x}_{k|k-1}(\Delta \hat{x}_{k|k-1}^T H_k^T+\Delta v_k^T)]. $$

Distribute. $$= E[\Delta \hat{x}_{k|k-1}\Delta \hat{x}_{k|k-1}^T] H^T + E[\Delta \hat{x}_{k|k-1} \Delta v_k^T]. $$

Now use independence of $v_k$ to distribute the expectation across the product. $$= E[\Delta \hat{x}_{k|k-1}\Delta \hat{x}_{k|k-1}^T] H^T + E[\Delta \hat{x}_{k|k-1}]E[ \Delta v_k^T]. $$ $$= E[\Delta \hat{x}_{k|k-1}\Delta \hat{x}_{k|k-1}^T] H^T + 0 $$ $$= P_{k|k-1} H_k^T $$

So in the end, the equivalence follows from the standard Kalman filter assumptions. The simplification is not possible for the UKF case because we do not have the matrix $H_k$.

You can, however, get a completely mechanical intuition about the information content of the cross covariance. It answers the following question: "If I wiggle state variable $i$ inside $\Delta \hat{x}_{k|k-1}$ upwards, how likely is it that observation variable $j$ of $(H_k \hat{x}_{k|k-1} + v_k)$ will also go up?" This what the entry $(i,j)$ of the crosscovariance means. And it makes sense that this information would be relevant for the operation of the UKF.

Related Question