[Math] How to calculate a population mean for a normal distribution

expectationnormal distributionprobabilitysamplingstatistics

This is for homework, but I'm a bit confused on how I can find $E(X_i) = \mu$ given a normal distribution.

The question is as follows: In a farm, let $X$ denote the number of fruits harvested in a tree. A tree has at least one fruit and at most ten fruits so that the support of the discrete random variable is $S_X = {1,2,…,10}$. Further assume that the probability distribution of $X$ is given by $P(X = x) = \frac{x}{55}$.

Also, we assume that $X_1,…,X_{50}$ is a random sample of 50 trees where each $X_i$ denotes the number of fruits in the $i^{th}$ tree. Since $X_1,…,X_{50}$ arise from the same population (ie. the same farm), we assume that:

$E(X_1) = E(X_2) = … = E(X_{50}) = \mu$ and

$Var(X_1) = Var(X_2) = … = Var(X_{50}) = \sigma^2$

Questions:

a. If $X_1,X_2,…X_{50}$ follow a normal distribution What are $E(X_i)$ and $Var(X_i)$ for $i = 1,…,50$?

b. Are the assumptions satisfied to apply CLT to approximate the sampling distribution of $\bar{X} = \frac{1}{50}\sum_{i=1}^{50}X_i$? Explain why.

c. What is the approximate sampling distribution of $\bar{X}_{50}$ by appealing to CLT?

For part a, the professor gave us the answer, and said $E(X_i) = 7$ and $Var(X_i) = 6$. I Just have no idea how he calculated either of these values.

I thought that to get the mean I should sum up the possible values I can get from $S_X$ which means $1+2+…+10 = 55$. And then divide that by the amount of numbers in the support, which is ten, so $\frac{55}{10} = 5.5$. This doesn't equal 7 though. So what should I be doing to calculate the mean?

Also I've looked through his notes and can't seem to find any way to calculate the variance, $\sigma^2$. The only formula my professor lists in his notes for variance is that I can get it by doing $p(1-p)$ but I don't know what $p$ is in this case.

For b, I think I can apply CLT to the sampling distribution since that's what I've been doing for the entire section and it seems to be the formula for finding that.

For c do I have to use a binomial distribution to solve this? Or do I translate it into the standard normal distribution? Not sure how to proceed with what it is asking.

I'm kind of confused about a majority of this so I appreciate any help.

Best Answer

Central Limit Theorem applied to a mean of discrete random variables.

(a) Saying that $X$ is normal is simply wrong; must be an error. We are told that the PDF (PMF) of $X$ is $f(x) = x/55,$ for $i = 1,\dots,10.$ As a check, we verify that $1/55 + \cdots + 10/55 = 1,$ as must be the case for a point mass function. Then, as you have already verified using one of the Comments,

$$\mu_x = E(X) = \sum_{x=1}^{10} xf(x) = 1/55 + 4/55 +\cdots + 100/55 = 7.$$

Also, the variance of $X$ is defined as $$\sigma_X^2 = V(X) = \sum_{x=1}^{10} (x-\mu_x)^2\,f(x),$$ and there is a theorem that says $$V(X) = E(X^2) - \mu_X^2.$$ You should be able to find one or both of these formulas in your text or notes. With either formula, you can verify that $\sigma_X^2 = V(X) = 6,$ as you were told. (The variance of a Bernoulli distribution is $p(1-p),$ but that has nothing at all to do with this problem; not just any formula with the word variance in the same paragraph will do.)

(b) The CLT applies to sums or averages of independent random variables, all having the same distribution. That is true of your $X_i$s. Averages of huge numbers $n$ are very close to normal. Experience has shown that for $n$ as large as 50, results are pretty good, provided the distribution of the $X_i$ is not extremely skewed. For your $X_i$ the distribution is moderately skewed, so the CLT should work well and your $\bar X$ should have nearly a normal distribution.

(c) We know that $E(\bar X) = \mu_X = 7$ and $V(\bar X) = \sigma_X^2/n = 6/50 = 0.12.$ You should also look for these fundamental formulas in your notes or text. Applying the CLT, it seems reasonable to say that the approximate sampling distribution of $\bar X$ is $N(7, 0.12).$

Addendum: If you wanted to know something like $P(\bar X < 6.5),$ you could find that by standardizing and using normal tables. If you want to try it, the answer using the CLT is between 0.07 and 0.08.

Below is a simulation of a million 50-tree experiments, using the exact distribution. By increasing the number of iterations, such a simulation could be a precise as you want. Here we get $P(\bar X < 6.5) \approx 0.072,$ which may be better than the normal approximation. But (based on this and other results not shown) the normal approximation certainly works well enough to vindicate the answer to part (c).

 m = 10^6;  n = 50;  pdf = (1:10)/55    
 x = sample(1:10, m*n, repl=T, prob=pdf)
 x.bar = rowMeans(matrix(x, nrow=m))  # each row of matrix is 50-tree expt
 mean(x.bar)
 ##  6.999567       # approximates 7
 var(x.bar)
 ## 0.1197779       # approximates 0.12
 mean(x.bar < 6.5)
 ## 0.072293        # approximates P(X-bar < 6.5)