Solved – MLE of a multivariate Hawkes process

likelihoodmaximum likelihoodstochastic-processes

I'm struggling with implementing the maximum likelihood estimator for a multivariate Hawkes process (HP). Specifically, while the analytical expression for a log-likelihood function of a univariate HP can be found easily online (e.g. Ozaki, 1979), there seem to be different (inconsistent or equivalent?) versions of the log-likelihood function of a multivariate HP out there. I also tried to derive the estimator myself below and I get yet another result (I'm very new to this subject though). Could somebody clear this up for me? Thanks!

This is my own go at a derivation (I follow the notation used in Laub et al., 2015). Consider a collection of $m$ counting processes $N=(N_{1},..,N_{m})$ with $t_{i,j}$ the observed arrival times for each counting process ($i=1,..,m$ and $j$ a natural number). Define a multivariate HP with exponentially decaying exictation functions such that the intensities are $\lambda^{*}_{i}(t)=\lambda_{i}+\sum\limits_{j=1}^{m}\sum\limits_{t_{j,k}<t}\alpha_{i,j}e^{-\beta_{i,j}(t-t_{j,k})}$. For this m-variate HP the log-likelihood $\ln L(t)$ is equal to the sum of the individual log-likelihoods, i.e.: $\ln L(t)=\sum\limits_{j=1}^{m} \ln L^{j}(t)$, with each individual component $\ln L^{j}(t)=-\int\limits_{0}^{T} \lambda^{*}_{j}(u)\mathrm{d}u+\int\limits_{0}^{T} \ln\lambda^{*}_{j}(u)\mathrm{d}N_{j}(u) $.

Let us first focus on the first part, which we call the compensator $\varLambda$.

enter image description here

Combining this with the results for the other parts of the log-likelihood should result in: $\ln L^{1}(t_i)= -\lambda_{1}T –
\frac{\alpha_{1,1}}{\beta_{1,1}}\sum\limits_{f=1}^{F}[e^{-\beta_{1,1} (t_{1,F} – t_{1,f})}-1] –
\frac{\alpha_{1,2}}{\beta_{1,2}}\sum\limits_{g=1}^{G}[e^{-\beta_{1,2} (t_{2,G} – t_{2,g})}-1] +
\sum\limits_{f=1}^{F} \ln [\lambda_{1}+\sum\limits_{j=1}^{2} \alpha_{1,j}R_{1,j}(f)]$

with $R_{1,j}(f)= \sum\limits_{t_{j,k}<t_{1,f}}e^{-\beta_{1,j}(t_{1,f}-t_{j,k})}$. A similar expression can be derived for $\ln L^{2}(t_i)$.

However, when I compare this result with other articles, I notice some differences. For example, in Toke (slide 56) the expression for the compensator is very different (sums over every element for every event-type) and, also, there is no $\lambda_{i}T$ term. Next, in Crowley (2013) (pg. 29) the expression for the compensator is much more elaborate. Further, the equation on 2.8 (page 9) in Zheng (2013) offers again an alternative (sums over a subset of the elements for every event-type) (note: there is a Matlab implementation at the end of the document). The article that resembles mostly to what I find is page 6 in Carlsson et al. (2007). As you can see I'm clearly confused. What is the correct likelihood function that I should program?

References:

  • Ozaki, 1979, Maximum likelihood estimation of Hawkes' self-exciting point processes

  • Crowley, 2013, Point Process Models for Multivariate High-Frequency Irregularly Spaced Data

  • Laub, Taimre & Pollett, 2015, Hawkes Processes

  • Zheng, 2013, High frequency dynamics of order flow

  • Carlsson, Foo, Lee & Shek, 2007, High Frequency Trade Prediction with Bivariate Hawkes Process

Best Answer

There is a small mistake in the derivation. In line 5 (in the inserted figure) one needs $T = t_{1,F} = t_{2,G}$ for the identity to be correct, and this is generally not the case. The terms in the final sums should be $e^{-\beta_{i,1}(T - t_{1,f})} - 1$ and $e^{-\beta_{i,2}(T - t_{2,g})} - 1$, respectively. Otherwise the derivation looks correct.

A slightly simpler derivation can take line 3 as a starting point. Then interchange the sums and integration with the resulting inner integral being from $t_{j,k}$ to $T$.

It might be worth noting that for the Hawkes process considered here, it is possible to compute $\lambda_i^*(t_{i,j})$ recursively, which implies that the computational complexity of the log-likelihood can be made linear in the number of jumps (instead of quadratic as the double sum over the jumps suggests).

I doubt that there are inconsistent versions of the likelihood in the literature, but there may, of course, be mistakes in some of the references. Another (likely) possibility is that the notation or the assumptions differ, or that the representations are, indeed, equivalent, but written in different ways. One possibility is that the baseline intensity $\lambda_i$ is omitted, so that the $\lambda_i T$ term disappears.