Regression – Modeling Win-Draw-Loss Outcomes in Sports

modelingpoisson distributionregression

I have data about different teams, players etc. I am trying to figure out the best way to model the outcome of a match, which can end in a win for the home team, a loss for the home team, or a draw. I am having trouble modelling this though.

For example, I can use a poisson regression to model the number of goals each team scores, and then calculate a grid of their probabilities, but I am not too happy with the independence assumption. I could also do a bivariate poisson, which i dont have much experience with. I am wondering what a suitable approach is for modelling the dependence of the outcome on the two teams, while also preserving the fact that the outcomes are mutually exclusive (the probabilities assigned to win draw loss should sum to unity).

Best Answer

You can use bivariate Poisson distribution with probability mass function

$$ f(x,y) = \exp\{-(\lambda_1+\lambda_2+\lambda_3)\} \frac{\lambda_1^x}{x!} \frac{\lambda_2^y}{y!} \sum^{\min(x,y)}_{k=0} {x \choose k} {y \choose k} k!\left(\frac{\lambda_3}{\lambda_1\lambda_2}\right)^k $$

where $E(X) = \lambda_1+\lambda_3$ and $E(Y) = \lambda_2+\lambda_3$ and $\mathrm{cov}(X,Y) = \lambda_3$, so you can treat $\lambda_3$ as a measure of dependence between the two marginal Poisson distributions. The pmf and random generation for this distribution is implemented in extraDistr package if you are using R.

In fact, this distribution was described in terms of analyzing sports data by Karlis and Ntzoufras (2003), so you can check their paper for further details. Those authors in their earlier paper discussed also the univariate Poisson model, where they concluded that independence assumption provides fair approximation since the difference between scores of both teams does not depend on the correlation parameter of bivariate Poisson (Karlis and Ntzoufras, 2000).

Kawamura (1984) described estimating parameters for bivariate Poisson distribution by direct search using maximum likelihood. As about regression models, you can use EM algorithm for maximum likelihood estimation, as Karlis and Ntzoufras (2003), or Bayesian model estimated using MCMC. The EM algorithm for bivariate Poisson regression is implemented in bivpois package (Karlis and Ntzoufras, 2005) that is unfortunately out of CRAN at this moment.


Karlis, D., & Ntzoufras, I. (2003). Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3), 381-393.

Karlis, D. and Ntzoufras, I. (2000) On modelling soccer data. Student, 3, 229-244.

Kawamura, K. (1984). Direct calculation of maximum likelihood estimator for the bivariate Poisson distribution. Kodai mathematical journal, 7(2), 211-221.

Karlis, D., and Ntzoufras, I. (2005). Bivariate Poisson and diagonal inflated bivariate Poisson regression models in R. Journal of Statistical Software, 14(10), 1-36.