Solved – Finding probability that the total of values in a sample is greater than a particular value

normal distribution

Im attempting the following question with this provided information.

μ = 675
σ = 21

A gardener receives a shipment of 30 bags of soil, find the probability that the total weight is less than 20kg. Explain why the information that the weight of the bags follow a normal distribution is not needed to answer this question.

Originally I attempted the following:

  • For the total weight of the shipment to be less than 20kg than the average weight of the bags must be less than the total weight (g) over the number of bags. 20000g / 30 = 667g (3 dp.)
  • With 667g being the sample mean for the shipment we can find a z-score to find out if the total weight of the shipment will be below 20kg. z = (667 – 675) / (21 / sqrt(30)) = -2.09

  • Probability from |z| = 0.0183

So my answer was that there is a 1.83% chance the shipment will be below 20kg.

But since the question states that we can answer the question without the information of the weight of bags following a normal distribution i'm inclined to think that using the populations st.dev to calculate my answer is an incorrect way to go about the question.

Is there another method I can use the calculate this question, otherwise what could possibly be meant that we do not need the normal distribution information to answer the question.

Best Answer

The question assumes $X_i \stackrel{\mathrm{i.i.d.}}{\sim} \mathcal{N}\left(675, 21^2\right)$ and asks you about the distribution of $Y \equiv \sum_{i=1}^{30} X_i$. In particular, you'd like to know $\Pr\left[Y < 20000\right]$.

Under the iid and normality assumptions, we know that $Y \sim \mathcal{N}\left(30 \cdot 675, 30 \cdot 21^2\right)$, so $\Pr\left[Y < 20000\right] \approx 0.0149$.

I got that number in R using

pnorm(20000, mean=20250, sd=sqrt(30) * 21)  # 0.0149

The rest of your question -- the bit where you wonder "what could possibly be meant that we do not need the normal distribution information to answer the question" -- is about the central limit theorem: what if the $X_i$ were still iid, with the same mean and variance as before, but non-normally distributed? How would their sum $Y$ be distributed? The CLT tells you that in the limit, as $n$ goes to infinity, their sum will be normally distributed, assuming the variance of the $X_i$ is finite. Look up the Lindeberg–Lévy CLT.

In this case $n$ equals 30, but it turns out that's already large enough for the normal approximation to be useful. Here are some examples in R:

simulate_normal <- function(n_bags=30, cutoff=20000) {
    return(sum(rnorm(30, mean=675, sd=21)) < cutoff)
}
mean(replicate(10^5, simulate_normal()))  # Around 0.0149 -- here the X_i are normal

simulate_uniform <- function(n_bags=30, cutoff=20000) {
    ## Uniform[a, b] has variance (b-a)^2 / 12
    width <- sqrt(21^2 * 12)
    return(sum(runif(30, min=675 - width/2, max=675 + width/2)) < cutoff)
}
mean(replicate(10^5, simulate_uniform()))  # Still around 0.0149 when the X_i are uniform

Even when you let $X_i \stackrel{\mathrm{i.i.d.}}{\sim} \mathcal{U}\left[675-36.37307, 675+36.37307\right]$, the answer using the normal approximation is nearly correct.

I chose those parameters for the uniform distribution so that it would have mean 675 and variance 21^2:

sd(runif(10^5, min=675 - 36.3707, max=675 + 36.3707))  # Around 21
Related Question