Solved – How to use student’s-t distribution without the sample size

estimationsample-sizeself-study

Here is my question (homework obviously):

A sample from a normal population produced variance 4.0.
Find the size of the sample if the sample mean deviates from the population
mean by no more than 2.0 with a probability of at least 0.95.

So I'm trying to find $n$, the sample size, having only $\hat{\sigma}$, the sample variance, and a bound on the distance between $\bar{x}$ and $\mu$. My intuition was normally in this situation we need to use the t distribution since $\hat{\sigma}$ is an unbiased estimate for $\sigma$ (we did all the proofs in class).The problem is the t distribution changes depending on $n$, the sample size, so which distribution (how many degrees of freedom) should I consult when looking up the t-values containing 95% of the probability mass? I tried it for different values of $n$, and then squared the values to compare them to the d.f. of the t distribution – the closest I could get was 0.6 off. (I took the t-value at $\alpha = 0.025$ (right-tail) for 5 d.f., implying $n$ is 6, and squaring the t-value gave me 6.61, which is a discrepancy of 0.61 (isn't this large?). The reason I squared the t-values becomes apparent if you "normalize" the bound on the means into a t-statistic. Am I going about this correctly? This doesn't seem right…

(edit – this is what I did):

We have $P(\left| \bar{x} – \mu \right| \leq 2.0) \geq 0.95$, also given is $\hat{\sigma} = 2$, need to find $n$. So:
$$-\sqrt{n}\frac{2.0}{\hat{\sigma}} \leq \sqrt{n}\frac{\left( \bar{x} – \mu \right)}{\hat{\sigma}} \leq \sqrt{n}\frac{2.0}{\hat{\sigma}}$$

filling in values:
$$-\sqrt{n}\frac{2.0}{2} \leq \sqrt{n}\frac{\left( \bar{x} – \mu \right)}{\hat{\sigma}} \leq \sqrt{n}\frac{2.0}{2}$$

then:
$$-\sqrt{n} \leq \sqrt{n}\frac{\left( \bar{x} – \mu \right)}{\hat{\sigma}} \leq \sqrt{n}$$

at this point I was pretty confused, so what I did was look up values for $\alpha = 0.025$ at different degrees of freedom for the t-distribution, and then since d.f. = n – 1, I took that t-value (which is $\sqrt{n}$ from what I've derived) and compared it to the value of n implied from the d.f. I was using…For example take the t distribution with 5 d.f. (implying sample size is 6). The t-value is 2.571 for my $\alpha$. Squaring this to get n, we get 6.61. So this n clearly $\neq$ 6 which was implied in the distribution I was using. All of this seems kind of ridiculous…Where did I fudge things?

Best Answer

Your basic approach of using the t distribution is on the right track, but I think you go off the rails somewhere.

Consider that the range you know about the value of the mean is actually a half width confidence interval.

Having come back to this - interesting question

By half width confidence interval I mean the 2; your estimate +/-2 is within a 95% confidence interval. So if I understand correctly you need to solve:

$2=t_{(0.975,n-1)}\sqrt{\frac{4}{n-1}}$

Ie find the value of n that 97.5th percentile of the t distribution with n-1 degrees of freedom, multiplied by the estimated standard error of your estimate, equals your half width confidence interval of 2. If n were a really large number so your statistic approximates a normal distribution t would be 1.96; the question is to find what it actually is, at the same time as n features as an unknown in your estimate of the sample variance.

This cannot be solved directly but you can find the value of n (which must be an integer) to give the closest match through other methods.

Hopefully I haven't gone too far given this is homework. .

Related Question