Solved – Wigan scores after 30 minutes. Calculate home, away and draw in percentage terms with poisson regression

gamespoisson distributionprobabilityself-study

I have the following exercise:

Wigan v city 90 mins match:
Wigan scores after 30 minutes.
Calculate home, away and draw in percentage terms with poisson
regression, assuming pre-match odds were 5.00 – 4.00 – 1.80 for
Win/Draw/Win respectively.

Please advise how to proceed with this question for my exam.

ADDITION: after reading answers below and trying to answer this I got as far as below. Pls help.

(NOTE: I just kept it simple and assumed 2/3 of the game left)

wigan mean, avg goals per game = 1.20 manu mean = 2.20

60 mins left, so new wigan mean 2/3 x 1.20 = 0.8 new manu mean 2/3 x
2.2 = 1.47

WIGAN WIN ODDS

P(wigan scores 1 or more) = 1 – Possion(0, 0.8) = 1 – 0.45 = 0.55

OR, P(MANU FAIL TO SCORE) = Poisson(0, 1.47) = 0.23

WIGAN WIN ODDS = 0.55 + 0.23 = 0.78

Quote: WHAT ABOUT P(wigan 2 AND manu 1) = P(wigan goals > manu)???

MANU WIN ODDS

Poss(manu 2 or more) = 1 – [Poisson(0) + Poisson(1)]

= 1 – [0.23 + 0.34]

= 0.43

DRAW ODDS

X = ManU scores exactly 1 and Wigan scores exactly 0 from

here on, AND manu scores same as wigan from here on.

PossM(1) x PossW(0) = 0.34 x 0.45 = 0.153

P(manu = wigan in scoring from here on), 1.47 – 0.8 = 0.67

Poss(0 goals given mean = 0.67) = 0.51

So P(X) = [0.34 x 0.45] x 0.5 = 0.077

0.78 + 0.43 + 0.077 = 1.28

1.28 = 100% So, 0.78/1.28 x 100% = 61%

So, 0.43/1.28 x 100% = 34%

So, 0.077/1.28 x 100% = 5%

DECIMAL ODDS = 100%/61% = 1.64

= 100%/34% = 2.94

= 100%/5% = 20.00

Clearly I made a mistake somewhere because the odds are too long on
draw at 19/1 even though draw should be a favorite or close to wigan
win here.

Best Answer

How I would look at this (and I put a caveat in at the start, I am not the best at maths but have an interest in sports) is that the English Premiership is possibly normally distributed based up on the standard deviation of error of a prediction around a rating system. I am basing this on the principles of Hal Stern: "On the Probability of Winning a Football game" (1991, The American Statistician, vol. 45, no. 3, pp.179-183) – and the fact that I do this for other sports for fun. You can get a better idea of this from something like a QQ plot (and other methods - someone answered one of my questions on this in a quite detailed fashion). The reason why I say “I think” is because given how low scoring football is I wouldn’t have a much certainty as say basketball or American football or Australian Rules Football. As I understand it the poisson distribution can be reasonably approximated to the normal distribution (given your question related to the poisson) and also the principal of scoring a goal could be governed by a Poisson variable.

The fundamentals, Manchester City are 1-0 down at home to Wigan after 30 minutes (I am assuming that it is a standard Premier League game and not a friendly or a Cup Game when there are a different level of significance to the result). Home Advantage in the Premiership can be calculated by rating systems (as a component). Home advantage exists in the English Premiership as it does in every sports league as is the general assumption anecdotally (be it positive or negative). In the case of the English Premiership a handy paper on this is “Home advantages in balanced competitions : English soccer 1991-1996”; Stephen R Clarke (Proceedings of the 3rd Australian Conference on Mathematics and Computers in Sport, Coolangatta, Queensland, 1996). You can calculate it individually for each team (assuming the trend would hold for a number of years to the point where it was significant) but in the case of this we will take the average Home Advantage.

Then you need team ratings, assuming you have past game data I would go with something like the ratings system put forward by Kenneth Massey as part of his 1997 thesis: (http://masseyratings.com/theory/massey97.pdf) – there are plenty of other ranking systems you can pick from (“Who’s No.1 – The Science of Rating and Ranking”, Carl Meyer, Amy N. Langville – Princeton 2012 – which covers the Massey thesis method amongst others). I personally use some of the methods suggested by Wayne Winston in Mathletics (Princeton 2009). This is where I don’t have a great maths background but get processes and can do Excel and where the principal for the following process comes from.

I’m expecting you are being expected to work the odds given backward to generate a team rating, but from “Why are Gambling Markets Organised So Differently from Financial Markets?” - The Economic Journal (Volume 114 - 2004), Steven Levitt” we understand that in a lot of cases bookmakers don’t set the odds based on their expected outcome of an event, but it is often to maximise client biases based on how they expect their clients to bet. As a result you could justify disregarding the odds given.

My fundamentals for the game based on my current spreadsheet for the English Premiership for this season:

Standard Deviation of Error of a Prediction from a Rating System for a game: 1.45553586306297 (stats to 2 dp would be fine but I pulled these straight from Excel) Rating for Manchester City: 0.895454396395622 Rating for Wigan: -0.602272751 Average Premiership Home Edge (In this case for Manchester City to benefit from): 0.352036612 Predicted Margin of Victory for Manchester City based on the above: 1.84976376

Game time: 90 minutes (I’m not assuming any injury time in either half) Game time already elapsed: 30 minutes Standard Deviation of the remaining game time: 1.45553586306297 divided by Square Root of 90/60 (e.g. to get the Standard Deviation in relation to the fraction of the remaining game time) = 1.188440056

For Manchester City to win they need to win by 1.5 goals so for the normal distribution x=1.5, your mean is 1.84976376 as you are assuming your rating of both teams still holds, your standard deviation is 1.188440056 (based on a 60 minute time segment) and I set Excel to return the Cumulative Distribution Function. I then subtracted this from 1 (so you are getting what is beyond this point). Based on the above I have the chance of Manchester City winning as 61.55%.

For the draw, it would be the same as the above + as per the above but instead of subtracting it from 1, you would subtract a normal distribution of x=0.5 (so they score one goal to reduce the margin, but not two goals more to win). I have the chance of this as 25.64%

To find the chance of Wigan winning I would just then subtract the first two results from one (as it is the only outcome not covered) and this returns 12.81%.

There are a couple of things that I don’t like about this;

1) There are a lot of assumptions to be made based on the question and what is available. 2) I am not 100% convinced yet that Soccer can be normally distributed based on the principles of Stern (in an ideal world you would be deriving ratings based on the abilities of the players on the pitch – e.g. adding linear weights through regression to goals scored etc.) 3) You are making the assumption in the above example that I have worked through that your ratings derived before the game would still hold at the 30 minute mark (which if you assumed the linear weights approach to players variables such as shots to derive a rating – this probably wouldn’t be the case). The other alternative would be to diminish the predicted mean (or margin of victory value through out the game). In the case of the above if you multiply the 1.84976376 by 0.66 e.g. a third of the game is gone and feed it in to the process above, you get the revised values:

Manchester City winning: 41.10% Draw: 32.03% Wigan winning: 26.87%

Which feels a bit more likely to me (note: I edited the figures this afternoon as I made a mistake adding things up and what I was subtracting from what - the figures just looked wrong) than the original set of figures obtained.

I hope this helps or gives you some other ideas and good luck.

Best Answer

Related Solutions

Solved – How to interpret coefficients in a Poisson regression with interaction terms

Solved – How to calculate probability percentage for logistic regression with threshold

Related Question