Solved – Noise covariance matrix in Kalman filter

covariancekalman filtervariance

In the development of Kalman filter, I hit roadblocks when trying to estimate the noise covariance matrix of both the state process and the measurement process.

In this post, the author mentioned "tuning" these covariance. Also, on the wikipedia page, it says In most real-time applications, the covariance matrices that are used in designing the Kalman filter are different from the actual (true) noise covariances matrices

In the simplest case where no control is added to the state process

State process $x_{t+1} = x_t + w_t$, where $w_t \sim \mathcal N (0, Q_t)$

Measurement process $z_t = H_t x_t + v_t$, where $v_t \sim \mathcal N (0, R_t)$

I wonder what is the best practice in determining these noise covariance matrices $Q_t$ and $R_t$, especially when the direct method doesn't generate robust statistics.

Edit: Given some sample data, the 'direct method' that I mentioned was referring to using linear regression to fit measured value $z$ and state variable $x$ so that $z=Hx+\epsilon$, and then calculate the sample covariance of $\epsilon$, and periodically update this regression.

The 'Filter' in Kalman Filter suggests its primary use is to give you recursive formulas for the state's time $t$ distribution conditional on all information up to that point in time: $p(x_t|y_{1:t})$. In this case, at each time you have a mean vector and covariance matrix. The formulas are recursive in the sense that each time's mean and covariance can be figured from the previous time's mean and covariance. But all of this, as you pointed out, is assuming that your model parameters are known.
The Kalman filter gives you conditional likelihoods, though, too. These can be combined into an overall likelihood function. After you filter, you can use more matrix multiplication to figure the Gaussian conditional likelihood: $$p(y_{t+1}|y_{1:t}) = \int p(y_{t+1}|x_{t+1})p(x_{t+1}|x_t)p(x_t|y_{1:t}).$$
Why is this useful? Because estimating the unknown state and observational covariance matrices, in your application, boils down to standard Maximum Likelihood Estimation, assuming you're a frequentist. The log-likelihood to be maximized is $$\log \left[\prod_{t=1}p(y_{t+1}|y_{1:t})p(y_1)\right].$$