Solved – When is a sample proportion p hat instead of x bar

distributionsmeansamplesample-sizestandard deviation

I just started my first statistics class and am not majoring in statistics so sorry if this sounds like a beginner question and also sorry if my language is incorrect. (feel free to correct me.) I have been learning about creating sample distributions of phat and also sample distributions of xbar. I was wondering if you can tell the difference between when one is needed and when the other is needed by looking at a mean, standard deviation and sample size.

I have two examples from my class one requires a sample distribution of phat and the other a sample distribution of xbar

First example using the sample distribution of xbar

Aamco Heating and Cooling, Inc., advertises that any customer buying an air conditioner during the first 16 days of July will receive a 25 percent discount if the average high temperature for this 16 day period is more than 5 degrees above normal. Daily high temperatures in July are normally distributed with a mean of 84 degrees and a standard deviation of 8 degrees.

If we consider the first 16 days of July to be a random sample, what are the expected value, standard deviation, and shape of the sampling distribution of the sample mean? (don't answer this question it's just here to show the question in context.)

And now the second using the sample distribution of phat

Assume that 30% of all business students at a university invest in the stock market. We randomly pick 500 students

Show the sampling distribution of phat, the sample proportion of business students at this university who invest in the stock market. (Yet again no need to do this just giving context.)

So yet again I'm just asking if there is a way to tell if I need to use the equations for xbar or for phat when given a mean, standard deviation, and sample size and asked to give a sampling distribution. (And yes I know the second example says give the sampling distribution of p-hat, but I want to know if there is a way to tell if it didn't say that.) Thanks and sorry again if this is a bad question.

Here are the meanings of x bar and p hat that were used to solved the first and last question respectivelyenter image description here:

Best Answer

Both questions are essentially applications of the Central Limit Theorem, which says (informally) that "the value of a sum over many samples from a common population will tend to a normal distribution as the number of samples becomes large".

The two questions differ in the type of data that they treat. The "xbar" question concerns temperature, which is a continuous measurement (e.g. a decimal number). The "phat" question implicitly concerns a binary measurement (true/false, e.g. each student either invests or does not).

Commonly a measurement of a random variable will be denoted by $x$. For a random sample $x_1,\ldots,x_N$ the sample mean will then be denoted by $\bar{x}=\frac{1}{N}\sum_ix_i$. This applies directly to the "xbar" question. Here each $x_i$ is a temperature measurement, and the question asks about the sampling distribution of $\bar{x}$. (This arises when $\bar{x}$ is computed many times over different samples, each of size $N$).

For the "phat" question, the notation and logic is consistent with this, but the connection is a little more involved. In this case each $x_i$ will correspond to an individual student, who either invests ($x=1$) or does not ($x=0$). The probability that a student will invest would commonly be denoted by $p$ ($=30\%$ in this case). These conventions of $\Pr[\text{true}]=p$ and $\{\text{true,false}\}=\{1,0\}$ are standard for the case of a binary random variable.

Now imagine we do not know the value of $p$, but wish to estimate it from a random sample of students $x_1,\ldots,x_N$. For a single student the expected value of $x_i$ is $p$, denoted $\mathbb{E}[x]=p$ (see also here). Similarly, by the properties of expectation, for the sample we have $\mathbb{E}[\bar{x}]=p$. So here the sample mean $\bar{x}$ provides an estimate of the population parameter $p$. In statistics it is standard practice to denote an estimate of a population parameter by using a "hat", so here we it makes sense to denote the sample mean as $\hat{p}$.

(For the "xbar" problem the comparable notation would be $\bar{x}=\hat{\mu}$, as there $x$ is normal rather than Bernoulli.)