[Math] Mathematical Statistics (Normal Distribution)

normal distributionprobabilitystatistics

The weights of a population of mice fed a certain diet follow a normal distribution with mean μ=100 grams and standard deviation σ=20 grams. A random sample of 8 such mice is taken.

(a) Find the probability that exactly 4 of the mice weigh between 80 and 100 grams, 2 of the mice weigh more than 100 grams, and 2 mice have weights less than 80 grams.

How I would do it, is say 4/8 are in 34% range, 2/8 are in 50% range, and 2/8 are in 16% range so I would average those numbers and get 33.5% but I'm not sure if that is right. Could someone let me know if I did it right.

Thanks.

Best Answer

You have the right probabilities for each weight range (to two decimal places, anyway,) but you are not combining them in the right way. It doesn't make sense to average them (e.g. if I have a $99.999\%$ chance of not winning the lottery this week and a $0.001\%$ chance of winning the lottery next week, that doesn't mean that I have a $50\%$ chance of winning the lottery on exactly one of the two weeks.)

To get the probability that the event $A$ happens and an independent event $B$ also happens, you multiply the probabilities: $P(A \mathbin{\&} B) = P(A)P(B)$.

So the probability that, out of your eight mice, the first four mice are in the $80$–$100$ gram range, the next two mice have mass $>100$ grams, and the last two mice have mass $<80$ grams, is given by

\begin{align*}\tag{$\ast$}(.34)^4 (.50)^2 (.16)^2.\end{align*}

However, this is not yet the answer to your question. You don't care about the order of the mice; any order is okay. So you have to consider other possibilities, such as the possibility that the first mouse is in the $80$–$100$ gram range, the next two mice have mass $>100$ grams, the two after that have mass $<80$ grams, and the final three again have mass in the $80$–$100$ gram range, giving you the term

\begin{align*}\tag{$\ast\ast$}(.34)(.50)^2 (.16)^2(.34)^3.\end{align*}

There are many different possible orders, corresponding to mutually exclusive events, so to get the total probability that one of them happens you have to add them up: $(\ast) + (\ast\ast) + \cdots$. Fortunately this is not as much work as it seems like, because the order doesn't matter for multiplication. We just have to figure out how many copies of the term $(\ast)$ to add up.

In other words, we have to figure out how many ways to arrange eight mice in a line, where four are in a certain weight range (call it "A"), two are in a second weight range B, and two are in a third weight range C, and we don't distinguish between different mice in the same weight range. So we can think of this as counting the number of words like AAAABBCC, ABBCCAAA, etc. The number of such words is given by $$N = \frac{8!}{4!\,2! \,2!}.$$

This is because the $8!$ counts the number of ways to arrange 8 letters, and we divide by $4!$, $2!$, and $2!$ because we don't care how the four A's are arranged among themselves, and similarly for the two B's and the two C's. Now to get the final answer we multiply this number $N$ by the probability $(\ast)$.

Related Solutions

[Math] Normal distribution and sample distribution standard deviation

You are probably studying, or about to study, the Central Limit Theorem. You should find some explanation of this in your text--adjacent to the discussion of the CLT. In particular, a sample mean of observations from a normal population is normally distributed.

If $X_1, X_2, \dots, X_{16}$ are a random sample from $Norm(\mu = 55, \sigma = 5),$ then the sample mean $\bar X \sim Norm(\mu = 55, \sigma = 5/\sqrt{16}) = Norm(5, 5/4).$

Key steps in the derivation of $E(\bar X) = \mu$ are as follows: $$E(\bar X) = E\left(\frac{1}{n}\sum_{i=1}^n X_i\right) = \frac{1}{n}E\left(\sum_{i=1}^n X_i\right) = \frac{1}{n}\sum_{i=1}^n E(X_i) = \frac{1}{n}(n\mu) = \mu.$$ Notice that each of the $n$ terms in the last summation is $\mu.$

Key steps in the derivation of $Var(\bar X) = \sigma^2/n$ are as follows, where the first step uses the identity $Var(aY) = a^2Var(Y):$

$$Var(\bar X) = Var\left(\frac{1}{n}\sum_{i=1}^n X_i\right) = \left(\frac{1}{n}\right)^2 Var\left(\sum_{i=1}^n X_i\right) \\= \frac{1}{n^2}\sum_{i=1}^n Var(X_i) = \frac{1}{n^2}(n\sigma^2) = \sigma^2/n.$$

Then $SD(\bar X) = \sqrt{Var(\bar X)} = \sqrt{\sigma^2/n} = \sigma/\sqrt{n}.$

Example: As a consequence, the probability that any one individual in the population weighs between 50 and 60kg is $P(50 <X_i < 60) \approx .68$ However, the probability that the average weight of 16 individuals lies between 50 and 50kg is $P(50 < \bar X < 60) \approx 1.$

 diff(pnorm(c(50,60), 55, 5))
 ## 0.6826895
 diff(pnorm(c(50,60), 55, 5/4))
 ## 0.9999367

The plot below shows the normal PDF of the population and the normal PDF of $\bar X.$

[Math] How to calculate standard deviation in case of known x values, assuming a normal distribution

Following @lulu's Comment, a 'grid search' for $\sigma$ would work, but there is a way to get an approximate solution by solving an explicit equation.

You know that $P(X < 7500) = 0.17$ Thus $$P\left(\frac{X-\mu}{\sigma} < \frac{7500-13300}{\sigma}\right) = 0.17$$ and using your calculator or printed tables you can find that $\frac{-5800}{\sigma} = -0.9541653$ and solve for $\sigma.$ I used R statistical software (where the inverse of the standard normal CDF is denoted qnorm) as follows:

qnorm(.17)
## -0.9541653

Addendum: If you suspect that $\sigma$ is between 1000 and 10,000, then here's how a grid search in R for $\sigma$, denoted sg, might work to give 6079.

sg = 1000:10000                      # list of integers from 1000 to 10000
pr=pnorm(7500, 13300, sg)            # P(X <= 7500) for X ~ NORM(13300, sg)
sg[abs(pr-.17) == min(abs(pr-.17))]  # search for sg with pr closest to .17
6079

Best Answer

Related Solutions

[Math] Normal distribution and sample distribution standard deviation

[Math] How to calculate standard deviation in case of known x values, assuming a normal distribution

Related Question