Solved – Are there any alternatives to simulation for determining the distribution of number of events from two dependent non-homogeneous Poisson processes

monte carlopoisson process

A "state of the art" model for the distribution of goals scored in a soccer match is that of Dixon
and Robinson (1998) "A Birth Process Model for Association Football Matches"
which accounts for two key
phenomenon:

1) More goals are scored at the end of matches than at the start (hypothesised to be due to the fatigue suffered by both teams)

2) Scoring rates depend upon the current score line for a myriad of reasons such as teams with a lead becoming complacent or teams preferring to play out for a draw rather than risk a loss by going for the win

The model assumes that the goals scored by the home and away teams in a match follow non-homogeneous Poisson processes. Let $t$ denote the time elapsed in a match, normalised to fall between $0$ and $1$, the $x$-length vector $\vec{t_H}$ denote the times at which the home team scored goals and the $y$-length vector $\vec{t_A}$ denote the times at which the away team scored goals. The likelihood for the match is then

$$
L(\vec{t_H},\vec{t_A}) = \exp\left(-\int_0^1 \lambda(t) dt\right)\frac{\prod_{i=1}^{x}
\lambda({t_{H}}_i)}{x!}exp\left(-\int_0^1 \mu(t) dt\right) \frac{\prod_{j=1}^{y} \mu({t_{A}}_j)}{y!}
$$

where $\lambda(t)$ is the scoring rate for the home team at time $t$ dependent on a combination of time homogeneous factors (e.g. home team attacking ability versus away team defending ability, home advantage) and time inhomogeneous factors (e.g. score line at time $t$). Similarly for $\mu(t)$.

The two processes are dependent because when a team scores the score line changes and the scoring rates are themselves score line dependent.

The likelihood can easily be evaluated by carrying out the integration in the exponent numerically. Hence it is straightforward to calculate the parameters of the model (team abilities, home advantage, time effect, score line parameters etc.) via maximum likelihood.

In terms of prediction, obvious quantities of interest are:

  • $P(x > y)$: home team wins
  • $P(x < y)$: away team wins
  • $P(x = y)$: draw
  • Probability of particular score lines, e.g. $P(x=1,y=0)$
  • Probability of total goals in the match, e.g. $P((x+y) < 2.5)$

To calculate these quantities (approximately) given a set of model parameters, we could use Monte Carlo methods to generate matches according to these processes and then calculate the frequencies of each final score. Simulating from the processes is relatively straightforward, by generating goals from a single enveloping homogeneous Poisson process in conjunction with rejection sampling and then distributing them to the home or away team accordingly.

The drawback to this approach is, obviously, the computational burden of Monte Carlo simulation. Consider attempting to make predictions in real-time as matches are being played, of which there may be many happening simultaneously, and it quickly becomes a cause for concern.

My question, therefore, is whether there are any alternative approaches we can consider which do not incur such as high computational cost (even if they rely on an approximation that sacrifices accuracy for ease of calculation)?


For clarity, I am not looking for (basic) suggestions on how to efficiently implement the Monte Carlo simulation which I have already written in multi-threaded C, uses quasi-random numbers which have been pre-generated using unrolling and exploits piecewise thinning to achieve a very high acceptance rate. If you think there is still scope for a dramatic performance increase then of course I am all ears but really I am looking for a fundamentally different approach!

Best Answer

That's an interesting problem. I'm not sure to have cought all you mean, but have you thought about reformulating some of your problems as hypothesis tests ? Like:

  • null hypothesis H0: $x > y$
  • alternative hypothesis H1: $x \le y$

and then to perform a likelihood ratio test ? Then the extracted p-value tells you if whether H0 is rejected given a certain significance level.

The reason I'm mentionning this is that performing a likelihood ratio test is same as performing 2 minimization which can be much faster than MC integration. However the integral inside the exp might still require an integration.

HTH

Related Question