Using expected value of number of attendees to an event to calculate expected revenue

expected valueprobability

In the problem you pre-sell 21 non-refundable tickets values at 50 dollars to an event and can only accommodate 20 people. But in the event the 21st person shows up you must pay the person who's out $100. Each person has a 2% of not showing up, independent of what anyone else does.

After having seen the solution it makes perfect sense. I'm looking for insight (with a good dose of intuition or other examples if possible) why my initial instinct of how to go about the problem was faulty so as to avoid similar mistakes in the future.

The actual solution conditions the expected payout on whether or not 21 people show. I think of this as a total law of expectation, in the same way we have a total law of probability.

E(payout) = E(payout|21 people show)P(21 people show) + E(payout|20 or fewer show)(1-P(21 show))

Like I said this makes total sense as to why it works.

Here's what my gut instinct immediately said to try however when I read the problem.

Calculate the expected number of attendees. It's binomial with p=.98 and n=21 so E(attendees) = 20.58. So I just said you have the 21(50)-(.58)(100) = 992. My thinking was that on average since 20.58 people show up then you have to pay back on average 58% of the penalty each time.

I've noticed this same sort of thing in a few contexts now where I've used expected value to calculate some number in the problem and then went on to base a payment off that number and it's been not quite right (but always kind of close), so I'm wanting to prevent this going forward.

It may be just as simple as the revenue has two different means under two different scenarios and you can't try to cram them together. Thus you have to partition into two cases. Like I said, I understand why I'm wrong, but I'm looking for some kind of insight on what specifically breaks down in my method and whether my method could be tweaked to produce the correct answer. I hope that makes sense.

Best Answer

One way to see why your approach doesn't work is if we modify $p$ to be smaller, so that the expected value of the number of attendees is less than $20$. For example, suppose $p = 2/3$. Then if $X$ is the random number of attendees, $$\operatorname{E}[X] = np = 21(2/3) = 14 < 20.$$ By your calculation, there is no excess above $20$ in the expected number of attendees, so how do you account for this? Would you compute $21(50) - (0)(100)$? That is also obviously wrong because for any $p > 0$, even if it is small, there remains a positive probability that $X = 21$, thus the expected revenue must always be strictly less than $21(50) = 1050$. In the above case where $p = 2/3$, we have $$\Pr[X = 21] = \binom{21}{21}p^{21} (1-p)^{21-21} = p^{21} \approx 0.000200486.$$ While this numeric example gives us an idea of why there is a flaw in your approach, we still don't have a formal mathematical explanation. We can see that the expected number of attendees is not the meaningful quantity through which we can obtain the expected revenue. This is because the relationship between the random variable $X$ and the random revenue, say $Y$, is not a linear one. Specifically, we have $$Y = \begin{cases} 1050, & 0 \le X \le 20, \\ 950, & X = 21. \end{cases}$$ We could use some tricks to write this in other ways, for example $$Y = 1050 - 100 \max(0, X - 20).$$ And in fact, this is a good way to generalize the original question to the case where if there are only $s$ seats, and each ticket buyer that shows up above the seat limit must be refunded $100$. Then $$Y = 1050 - 100 \max(0, X - s)$$ and the original question sets $s = 20$. But as you can see from this formula, $$\operatorname{E}[Y] = 1050 - 100 \operatorname{E}[\max(0, X - 20)] \ne 1050 - 100 \max(0, \operatorname{E}[X] - 20).$$ In fact, the RHS is precisely what you tried to do. You tried to take the average number of attendees $\operatorname{E}[X]$, subtract $20$, and this excess is what you multiplied by $100$. And my counterexample at the beginning considered what happened when $\operatorname{E}[X] < 20$ so that the maximum of $0$ and a negative number is $0$, which clearly results in a wrong answer. So it's clear we can't do this because $$\operatorname{E}[g(X)] \ne g(\operatorname{E}[X])$$ for some general function $g$. For example, $\operatorname{E}[X^2] \ne (\operatorname{E}[X])^2$. Expectation is a linear operator, so if $g$ is a linear function, it does work: $$\operatorname{E}[aX + b] = a\operatorname{E}[X] + b,$$ for constants $a$, $b$. But it doesn't work when $g$ is nonlinear, as in this case.


This brings us to the question of how we might evaluate an expression such as $$\operatorname{E}[\max(0, X-s)].$$ Well, this was originally written as a piecewise/casewise function, where the cases were whether $X > s$ or $X \le s$. So those are the outcomes on which we must condition the expectation: $$\operatorname{E}[\max(0, X-s)] = \operatorname{E}[0]\Pr[X-s \le 0] + \operatorname{E}[X-s \mid X - s > 0]\Pr[X-s > 0].$$ Since the first term is just $0$, the second term is $$(\operatorname{E}[X \mid X > s]-s)\Pr[X > s].$$ For $s = 20$, $n = 21$, $p = 0.98$, we get $$\begin{align} (\operatorname{E}[X \mid X > 20] - 20)\Pr[X > 20] &= (\operatorname{E}[X \mid X = 21] - 20)\Pr[X = 21] \\ &= (21-20)(0.98)^{21} \\ &\approx 0.654256, \end{align}$$ hence $$\operatorname{E}[Y] \approx 1050 - 100(0.654256) = 984.574.$$ As an exercise, what would be your expected revenue if $s = 19$? That is to say, if there were only $19$ seats available, and each attendee in excess needs to be refunded $100$?

Related Question