Variance of Poison Random Variable

probabilitystatistics

For a certain insurance company, 10% of its policies are Type, A, 50% are Type B, and 40% are Type C.
The annual number of claims for an individual Type A, Type B, and Type C policy follow Poisson distributions with respective means 1, 2, and 10.
Let X represent the annual number of claims of a randomly selected policy.
Calculate the variance of X.

So one way of arriving at the answer is using the conditional variance formula.

Var(X) = E[Var(X | type] + Var(E[X |type])
$$=E[1,2,10] + Var(1,2,10)$$
$$=5.1 + Var(1,2,10)$$

From there you calculate the second moment

$$E[{1,2,10}]^2 = 42.1$$
$$42.1 – 5.1^2=16.09$$
$$Var(X)= 5.1 +16.09 $$

And I understand this method. But when I did it by myself, I just calculated first and second moments and used the formula $$Var(X) = E(X^2) – E(X)^2$$
getting 16.09. Why is this not the answer? I need help conceptualizing why 5.1 must be added to the variance here.

Best Answer

In the law of total variance $$\operatorname{Var}[X] = \operatorname{E}[\operatorname{Var}[X \mid T]] + \operatorname{Var}[\operatorname{E}[X \mid T]]$$ where I have used $T$ to denote the type of policy, with $$T \sim \operatorname{Categorical}(\pi_1 = 0.1, \pi_2 = 0.5, \pi_3 = 0.4), \\ \Pr[T = i] = \pi_i,$$ where the coding is $A \equiv 1$, $B \equiv 2$, and $C \equiv 3$, the first component $$\operatorname{E}[\operatorname{Var}[X \mid T]]$$ is what we call the "within-group" variance; this is the mean of the variability of $X$ that is attributable to each group. The second component $$\operatorname{Var}[\operatorname{E}[X \mid T]]$$ is what we call the "between-groups" variance; this is the variance of the conditional means of $X$ for each group; i.e., the variability of the means between groups.

How do we compute each of these? Since $X \mid T$ is Poisson, specifically $$X \mid T \sim \operatorname{Poisson}(\lambda_T)$$ where $\lambda_1 = 1, \lambda_2 = 2, \lambda_3 = 10$, and both the mean and variance of a Poisson distribution are equal to its rate parameter, we have $$\operatorname{E}[X \mid T] = \operatorname{Var}[X \mid T] = \lambda_T.$$ Consequently $$\operatorname{E}[\operatorname{Var}[X \mid T]] = \operatorname{E}[\lambda_T] = \lambda_1 \pi_1 + \lambda_2 \pi_2 + \lambda_3 \pi_3 = 5.1,$$ and $$\operatorname{Var}[\operatorname{E}[X \mid T]] = \operatorname{Var}[\lambda_T] = \operatorname{E}[\lambda_T^2] - \operatorname{E}[\lambda_T]^2 = (\lambda_1^2 \pi_1 + \lambda_2^2 + \pi_2 + \lambda_3^2 \pi_3) - (\lambda_1 \pi_1 + \lambda_2 \pi_2 + \lambda_3 \pi_3)^2 = 16.09.$$

Failing to take into account variability due to differences between group means--i.e, the second variance component--you are just computing how much variation arises from each individual group, ignoring that the groups may be located far apart from each other, as in this case where Type $C$ policies have a mean annual claim rate of $10$, far more than the other two types. Another way to see this is that if $\lambda_1, \lambda_2, \lambda_3$ are all clustered very "close together," i.e., there is very little variability of the mean claim rates by policy type, then the second component will be very small relative to the first component.

Related Solutions

[Math] probability expected payment

We know: $X\sim\mathcal {Pois}(c)$ and $\mathsf P(X=0)=0.60$.

Since the first fact means that $\mathsf P(X=x)~=~\dfrac{c^x\mathsf e^{-cx}}{x!}\mathbf 1_{x\in\Bbb N}$ , we can easily calculate $c$ knowing the second fact.

We know $Y := 5000(X-1)^+~$ which is $~Y=5000\max(X-1,0)$

Then $\mathsf E(Y) = 5000~\mathsf E(X-1\mid X\geq 1)~\mathsf P(X\geq 1) \color{silver}{+ \require{cancel}\cancel{0~\mathsf P(X=0)}}$

If only the Poisson distribution had some convenient property that allowed us to easily find this conditional expectation without messy summation. Hmm...

Poisson distribution question: Company XYZ provides a warranty on a product that it produces…

Number of claims is $X \sim \mathsf{Pois}(c),$ so $P(X=0) = e^{-c} = 0.6,$ which implies $c = -\log_e(.6) = 0.511.$ So $E(X) = 0.5108.$

c =-log(.6); c
[1] 0.5108256

Because of XYZ can't collect on the first claim, you can't multiply $c$ by \$5000 to get the average annual payout as $5000c.$

There may be some years with only one claim so XYZ collects nothing, and there may be years with several valid claims so thqt XYZ collects from the insurance company.

Here is a strictly computational method. Technically, Poisson probabilities extend from $0$ to $\infty.$ But terms decay rapidly to 0, Let's use the total number of claims for illustration. You can come extremely close to $E(X)$ by summing only the first 100 terms of the infinite series,

k = 1:99;  s = sum(k*dpois(k, -log(.6)))
s    
[1] 0.5108256  # exactly E(X) to many places

Now we find the expected number $Y$ of claims with payouts. We seek $E(Y) = 0.1108$ so that we can give the expected annual payout of $5000\,E(Y)\approx \$554$.

We let the 100-vector j represent the number of paid claims and the 1000-vector i be the number of claims.

j = c(0,0,1:98);  i = 0:99
  5000*s
[1] 0.1108256  # E(Y) = avg nr. payable claims
[1] 554.1281   # 5000E(Y) = avg annual payout

Note: in case you're wondering whether it is sufficient to sum the first 100 terms of the series, the first 50 would have been more than enough:

j = c(0,0.1:48); i = 0:48
sum(j*dpois(i, -log(.6)))
[1] 0.1508256

Almost all of the probability of the distribution $\mathsf{Pois}(-log(.6))$ is below 6.

qpois(.9999, -log(.6))
[1] 5

Best Answer

Related Solutions

[Math] probability expected payment

Poisson distribution question: Company XYZ provides a warranty on a product that it produces…

Related Question