Simple probability question – how to approach systematically, not overthink, or misinterpret

descriptive statisticsmathematical modelingprobabilityprobability distributionsstatistics

I have been studying some probability 'story problems' and surprisingly the ones with very simple solutions tend to confuse me. Particularly, the question below.

The question: Losses covered by an insurance policy are modeled by a uniform distribution on the interval [0,1000]. An
insurance company reimburses losses in excess of a deductible of 250.

Calculate the difference between the median and the 20th percentile of
the insurance company reimbursement, over all losses.

(A) 225
(B) 250
(C) 300
(D) 375
(E) 500

If you are young like me and don't know what a deductible is, it means for a loss $L > 0$ the insurance company will pay you a reimbursement $R = \mathrm{max}\{ L -250, 0 \}$.

The solution provided:
Before applying the deductible, the median is 500 and the 20th percentile is 200. After applying
the deductible, the median payment is 500 – 250 = 250 and the 20th percentile is max(0, 200 –
250) = 0. The difference is 250.

I have a strong background in math but not a good foundation in probability. The solution seems so simple, but without seeing the solution I would not have been confident that method is correct and justified. Part of this may be the wording of the questions – does the part 'over all losses' make a difference? I could not figure out what that meant. Here is how I interpreted the question as I read it for the first time:

  1. Losses $L$ are a random variable with distribution $L$ ~ $U[0,1000]$. (This makes the first part of the solution make sense since median and percentile of uniform distribution are trivial.)

  2. Reimbursements $R$ are a new random variable and to find the median and 20th percentile I should find how $R$ is distributed.

  3. For some particular loss $\ell > 0$, the reimbursement is given by $r = \mathrm{max}\{\ell – 250, 0 \}$.

  4. Then a PDF for $R$ should be $$p(r) = \cases{ .25, \hspace{5mm} \text{ if } r = 0 \\ 750^{-1}, \hspace{2mm} \text{ if } r \in (0,750]} \, $$

  5. Using this, I would have found a median to be 281.25 by solving for a number $k$ such that $p(0 < r < k) = p(k < r < 750)$ which is inconsistent with the solution provided.

My question for you: It seems I am overthinking or misunderstanding. The solution provided seems much simpler. Is there a concept I am missing that I can learn about that would allow me to confidently and quickly answer this question? Am I misinterpreting the question? Am I misunderstanding the definition of something? It is very important to me to feel like I understand the definitions and justifications. I do not understand why the solution provided is justified, nor would I have solved it that way on my own – so I am unsatisfied with my understanding. Any guidance appreciated.

Best Answer

If $f$ is a non-deceasing function, and $x_p$ is the $p$-th percentile of $X$, then the $p$-th percentile of $f(X)$ is $f(x_p)$. Here $f(x)=\max(x-250, 0)$. This fact is not difficult to prove once its pointed out to you, but isn't really that obvious otherwise. It's not necessarily true when $f$ is not non-decreasing. The person who wrote the solution definitely shouldn't have just used this fact implicitly like that (of course, I may be being too generous in assuming they're even aware that their argument requires something like this result, but let's not get the pitchforks out just yet).

Related Question