Variance of Poison Random Variable

probabilitystatistics

For a certain insurance company, 10% of its policies are Type, A, 50% are Type B, and 40% are Type C.
The annual number of claims for an individual Type A, Type B, and Type C policy follow Poisson distributions with respective means 1, 2, and 10.
Let X represent the annual number of claims of a randomly selected policy.
Calculate the variance of X.

So one way of arriving at the answer is using the conditional variance formula.

Var(X) = E[Var(X | type] + Var(E[X |type])
$$=E[1,2,10] + Var(1,2,10)$$
$$=5.1 + Var(1,2,10)$$

From there you calculate the second moment

$$E[{1,2,10}]^2 = 42.1$$
$$42.1 – 5.1^2=16.09$$
$$Var(X)= 5.1 +16.09 $$

And I understand this method. But when I did it by myself, I just calculated first and second moments and used the formula $$Var(X) = E(X^2) – E(X)^2$$
getting 16.09. Why is this not the answer? I need help conceptualizing why 5.1 must be added to the variance here.

Best Answer

In the law of total variance $$\operatorname{Var}[X] = \operatorname{E}[\operatorname{Var}[X \mid T]] + \operatorname{Var}[\operatorname{E}[X \mid T]]$$ where I have used $T$ to denote the type of policy, with $$T \sim \operatorname{Categorical}(\pi_1 = 0.1, \pi_2 = 0.5, \pi_3 = 0.4), \\ \Pr[T = i] = \pi_i,$$ where the coding is $A \equiv 1$, $B \equiv 2$, and $C \equiv 3$, the first component $$\operatorname{E}[\operatorname{Var}[X \mid T]]$$ is what we call the "within-group" variance; this is the mean of the variability of $X$ that is attributable to each group. The second component $$\operatorname{Var}[\operatorname{E}[X \mid T]]$$ is what we call the "between-groups" variance; this is the variance of the conditional means of $X$ for each group; i.e., the variability of the means between groups.

How do we compute each of these? Since $X \mid T$ is Poisson, specifically $$X \mid T \sim \operatorname{Poisson}(\lambda_T)$$ where $\lambda_1 = 1, \lambda_2 = 2, \lambda_3 = 10$, and both the mean and variance of a Poisson distribution are equal to its rate parameter, we have $$\operatorname{E}[X \mid T] = \operatorname{Var}[X \mid T] = \lambda_T.$$ Consequently $$\operatorname{E}[\operatorname{Var}[X \mid T]] = \operatorname{E}[\lambda_T] = \lambda_1 \pi_1 + \lambda_2 \pi_2 + \lambda_3 \pi_3 = 5.1,$$ and $$\operatorname{Var}[\operatorname{E}[X \mid T]] = \operatorname{Var}[\lambda_T] = \operatorname{E}[\lambda_T^2] - \operatorname{E}[\lambda_T]^2 = (\lambda_1^2 \pi_1 + \lambda_2^2 + \pi_2 + \lambda_3^2 \pi_3) - (\lambda_1 \pi_1 + \lambda_2 \pi_2 + \lambda_3 \pi_3)^2 = 16.09.$$

Failing to take into account variability due to differences between group means--i.e, the second variance component--you are just computing how much variation arises from each individual group, ignoring that the groups may be located far apart from each other, as in this case where Type $C$ policies have a mean annual claim rate of $10$, far more than the other two types. Another way to see this is that if $\lambda_1, \lambda_2, \lambda_3$ are all clustered very "close together," i.e., there is very little variability of the mean claim rates by policy type, then the second component will be very small relative to the first component.

Related Question