[Math] Maximum Likelihood Estimation of an Ornstein-Uhlenbeck process

multivariable-calculusprobability theorystatisticsstochastic-processes

I am wondering whether an analytical expression of the maximum likelihood estimates of an Ornstein-Uhlenbeck process is available. The setup is the following: Consider a one-dimensional Ornstein-Uhlenbeck process $(X_t)_{t\geq 0}$ with $X_0=x$ for some $x\in\mathbb{R}$, i.e. $(X_t)_{t\geq 0}$ solves the SDE
$$
\mathrm{d} X_t=\theta(\mu-X_t)\,\mathrm{d} t + \eta\,\mathrm{d} W_t,\quad X_0=x
$$
where $(W_t)_{t\geq 0}$ is a standard Wiener process and $\eta,\theta>0$, $\mu\in\mathbb{R}$. If $\lambda=(\eta,\theta,\mu)$ is the vector of parameters, then the transition densities are known and if $p_{\lambda}(t,x,\cdot)$ denotes the density of $X_t$ (remember $X_0=x$) with respect to the Lebesgue-measure, then
$$
p_{\lambda}(t,x,y)=(2\pi\beta)^{-1/2}\exp\left(-\frac{(y-\alpha)^2}{2\beta}\right),\quad y\in\mathbb{R},
$$
where $\alpha=\mu+(x-\mu)e^{-\theta t}$ and $\beta=\frac{\eta^2}{2\theta}(1-e^{-2\theta t})$.

Suppose we have observed an Ornstein-Uhlenbeck process in equidistant time-instances (where the parameter $\lambda$ is unknown), i.e. the vector of observations is given by
$$
\mathbf{x}=\{x_0,x_{\Delta},\ldots,x_{N\Delta}\},
$$
where $x_0=x$ and $\Delta>0$ and $N+1$ is the number of observations. Then by the Markov property of $(X_t)_{t\geq 0}$ we have that the log-likelihood function is given by
$$
l(\lambda)=l(\theta,\eta,\mu;\mathbf{x})=\sum_{i=1}^N \log\left(p_{\lambda} (\Delta,x_{(i-1)\Delta},x_{i\Delta})\right).
$$
Now i am asking if it is possible to maximize this expression with respect to $\lambda=(\eta,\theta,\mu)$ simultaneously and if so, how would one go about doing this. If anyone can point me in the direction of a paper/book where this is shown, it would be much appreciated. Thanks in advance!

Best Answer

In the paper "Parameter estimation and bias correction for diffusion processes" by Tang and Chen explicit formulas for the MLE are given. Their formulas ignore $X_0$, but this makes little difference if the number of observations is reasonably large. I am puzzled how they managed to come up with this formulas, though. Solving $l'(\lambda)=0$ seem to be a difficult task.