In the development of Kalman filter, I hit roadblocks when trying to estimate the noise covariance matrix of both the state process and the measurement process.

In this post, the author mentioned "tuning" these covariance. Also, on the wikipedia page, it says *In most real-time applications, the covariance matrices that are used in designing the Kalman filter are different from the actual (true) noise covariances matrices*

In the simplest case where no control is added to the state process

State process $x_{t+1} = x_t + w_t$, where $w_t \sim \mathcal N (0, Q_t)$

Measurement process $z_t = H_t x_t + v_t$, where $v_t \sim \mathcal N (0, R_t)$

I wonder what is the best practice in determining these noise covariance matrices $Q_t$ and $R_t$, especially when the direct method doesn't generate robust statistics.

**Edit**: Given some sample data, the 'direct method' that I mentioned was referring to using linear regression to fit measured value $z$ and state variable $x$ so that $z=Hx+\epsilon$, and then calculate the sample covariance of $\epsilon$, and periodically update this regression.

## Best Answer

The 'Filter' in Kalman Filter suggests its primary use is to give you recursive formulas for the state's time $t$ distribution conditional on all information up to that point in time: $p(x_t|y_{1:t})$. In this case, at each time you have a mean vector and covariance matrix. The formulas are recursive in the sense that each time's mean and covariance can be figured from the previous time's mean and covariance. But all of this, as you pointed out, is

assuming that your model parameters are known.The Kalman filter gives you conditional likelihoods, though, too. These can be combined into an overall likelihood function. After you filter, you can use more matrix multiplication to figure the Gaussian conditional likelihood: $$ p(y_{t+1}|y_{1:t}) = \int p(y_{t+1}|x_{t+1})p(x_{t+1}|x_t)p(x_t|y_{1:t}). $$

Why is this useful? Because estimating the unknown state and observational covariance matrices, in your application, boils down to standard Maximum Likelihood Estimation, assuming you're a frequentist. The log-likelihood to be maximized is $$ \log \left[\prod_{t=1}p(y_{t+1}|y_{1:t})p(y_1)\right]. $$

Last time I implemented this I wrote a function that filters through the data and calculates the above quantity given a certain parameter set (your unknown covariance matrices), then I pass this function into an optimization function and made it do Newton-Raphson which iteratively revises the parameters. I remember having trouble if there were too many parameters, or if I had a really bad starting point. The likelihood is generally non-convex. And under certain regularity conditions, your estimates are asymptotically normal, minimum variance and unbiased.

You can also use the EM algorithm. You can probably find some code for this online as well. But the main point of this answer is that I am not sure what you mean by 'direct method'-- but you shouldn't worry. This is plain hill-climbing stuff.