Solved – Maximum Likelihood estimation and the Kalman filter

kalman filterlikelihoodmaximum likelihoodstate-space-modelstime series

I know the Kalman filter recursions and can derive these but what I don't really get is how to estimate the hyper parameters using maximum likelihood.

I understand that when running the Kalman filter we get the prediction error and its variance which can be used the construct the likelihood function.

What I don't really get is the order these steps are done in. Am I supposed to:

Method 1

1) Run the Kalman filter given arbitrary starting values and obtain the likelihood function.

2) Maximize the likelihood function wrt to the hyper parameters of the model.

Method 2

1) Estimate the hyper-parameters of the state space model using maximum likelihood.

2) Run the Kalman filter with the hyper-parameters set at these estimates.

I found this question which answers what I need: LogLikelihood Parameter Estimation for Linear Gaussian Kalman Filter. Here the hyper parameters are estimated from the likelihood function and that is the same as the algorithm on p. 8 top in these lecture notes specifies. However, in the same notes it is written (p. 10 mid): "Given a set of optimal parameter values, $\theta_{ML}$, it is now worth to explore the paths of unobserved components…".

Is the correct way to conduct the analysis the following?

Method 3

1) Run the Kalman filter given arbitrary starting values and obtain the likelihood function.

2) Maximize the likelihood function wrt to the hyper parameters of the model.

3) Run the Kalman filter again using the ML estimates obtained in step 2). Use these state estimates in the following analysis?

Best Answer

I could be wrong, but what makes sense to me is this:

define a function for the kalman filtering and prediction. Make that output the log likelihood (using v and the covariance matrix of v). The log likelihood in this case is described in the stack exchange post you refer to. Make sure Q, R, mu_0 and A are free parameters
Optimize the function with respect to those parameters by maximizing the log likelihood.

Essentially yes, the underlying optimization procedure will start with random parameter values but from there it will optimize the parameters to fit the observables. I don't see how you can estimate these parameters first and then do the kalman filter.

Source: https://faculty.washington.edu/eeholmes/Files/Intro_to_kalman.pdf

Related Solutions

Solved – LogLikelihood Parameter Estimation for Linear Gaussian Kalman Filter

When you run the Kalman filter as you have, with given values of $\sigma_\epsilon^2$ and $\sigma^2_\eta$, you get a sequence of innovations $\nu_t$ and their covariances $\boldsymbol{F_t}$, hence you can calculate the value of $\log L(Y_n)$ using the formula you give.

In other words, you can regard the Kalman filter as a way to compute an implicit function of $\sigma_\epsilon^2$ and $\sigma^2_\eta$. The only thing that you need to do then is to package this computation into a function or subroutine and handle that function to an optimization routine --like optim in R. That function should accept as inputs $\sigma_\epsilon^2$ and $\sigma^2_\eta$ and return $\log L(Y_n)$.

Some packages in R (e.g. dlm) do this for you (see for instance function dlmMLE).

Edit: Link in one of my comments below, which I cannot edit, is now here.

Solved – Maximum likelihood estimation for state space models using BFGS

I will give you an answer from my experience developing the R package stsm that is introduced in this document. By default ithe package uses the function optimize, which is a combination of a golden section search and successive parabolic interpolation. Other line search algorithms, such as the zoom line search algorithm will do the job similarly well. The algorithm in function optimize is convenient because it does not require the gradient of the function to be minimized. Here is some pseudo code to compute the optimal step size:

fcn.step <- function(step, model, direction)
{
  trial.pars <- model$pars + step * direction
  -logLik(model, pars = trial.pars)
}
s <- line.search(f = fcn.step, interval = c(0, 1))

fcn.step is the function to be minimized, the only parameter is the step size; the parameters of the model are fixed to the values from the last iteration of the BFGS optimization algorithm. The negative of the log-likelihood function is minimized with respect to the step size; direction is the optimal direction vector computed at the current iteration of the BFGS algorithm, $\tilde{\pi(\psi)}$ in your notation.

It is also important the choice of the upper limit of the interval where the optimal value of the step size s is searched. By default, the package stsm calculates the maximum value of s that is compatible with positive values for the variance parameters updated in equation (3). If this value is lower than $1$ then it is taken as the upper bound passed to the line search algorithm, otherwise this bound is set equal to $1$. You may look at function step.maxsize in the package or the function calc.constraint in this implementation of the BFGS algorithm.

With this approach you can apply the BFGS algorithm without the risk of reaching a local optimum with negative variance parameters. Otherwise, I would recommend you either the L-BFGS-B algorithm, which allows setting a lower bound equal to zero in the parameter space where the algorithm searches for the optimum; or a reparameterization of the model, e.g. variance = theta^2 or variance = exp(theta) and maximize the likelihood with respect to theta.

As regards the EM algorithm, as you mention, it is slow to converge. Strangely enough, it converges slower as it approaches the local optimum. For some insight into this issue and a modified EM algorithm you may look at this document. I am the author of this document, you may send me an e-mail if you have any questions about it or if you want a copy (the link is not always available).

Optimization of the likelihood function of state space models can be hard in practice. Some enhancements to the general-purpose optimization algorithm are recommended. For example, one of the parameters of the model can be concentrated out of the likelihood function; maximum likelihood in the frequency domain enjoys some advantages from a computational point of view and may provide good values to be used as starting values in the optimization of the time domain likelihood function.

Best Answer

Related Solutions

Solved – LogLikelihood Parameter Estimation for Linear Gaussian Kalman Filter

Solved – Maximum likelihood estimation for state space models using BFGS

Related Question