Obviously you will need to know the type of confidence interval you are dealing with, but let's suppose that this is a standard one-sample confidence interval for the mean, using the standard T-statistic as the pivotal quantity. In that case, the formula for the interval is:
$$\text{CI}(1-\alpha) = \Bigg[ \bar{x} \pm \frac{t_{n-1, \alpha/2}}{\sqrt{n}} \cdot s \Bigg].$$
Thus, if we denote the known lower and upper bounds of the interval as $l$ and $u$ respectively, then you can algebraically reverse-engineer the sample mean and sample standard deviation as:
$$\bar{x} = \frac{l+u}{2}
\quad \quad \quad \quad \quad
s = \frac{u-l}{2} \cdot \frac{\sqrt{n}}{t_{n-1, \alpha/2}}.$$
With the values specified in your example, you get:
#Set preliminary values
l <- 5.18;
u <- 5.38;
n <- 300;
alpha <- 0.05;
#Compute sample mean and SD
crit <- qt(alpha/2, df = n-1, lower.tail = FALSE);
MEAN <- (l+u)/2;
SD <- (u-l)*sqrt(n)/(2*crit);
#Print the values
MEAN;
[1] 5.28
SD;
[1] 0.8801386
Thus, assuming that your interval was a standard one-sample confidence interval, you must have had a sample mean $\bar{x} = 5.28$ and sample standard deviation $s = 0.88$.
Suppose you have 150 locations altogether, and you decide to base
your confidence interval for the mean of the population (for some attribute) from a sample of size 10.
whole = rnorm(150, 50, 7)
x = sample(whole, 10)
summary(x); length(x); sd(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
37.86 43.03 45.24 47.92 52.61 59.93
[1] 10 # sample size
[1] 7.470816 # sample standard deviation
t.test(x)$conf.int
[1] 42.57347 53.26207
attr(,"conf.level")
[1] 0.95
The mean for the whole company is 50; a 95% confidence
interval for the mean is $(42.6, 53.3).$ I used the t.test
procedure in R, but the 95% CI can be found from the
formula $\bar X \pm t^* S/\sqrt{n},$ where $t^* = 2.262$ cuts probability
2.5% from the upper tail of Student's t distribution with $\nu = n-1 = 9$ degrees of freedom
qt(.975, 9)
[1] 2.262157
mean(x) + qt(c(.025,.975),9)*sd(x)/sqrt(10)
[1] 42.57347 53.26207
If you knew the population standard deviation $\sigma=7,$
then you could use $\bar X \pm 1.96(7/\sqrt{10}),$ which
computes to $(42.6,53.3)).$ In general, this method has the
potential to be a little more accurate, but there is no
difference (to one place accuracy) from the CI above for this example.
mean(x) + qnorm(c(.025,.975))*7/sqrt(10)
[1] 43.57920 52.25633
Notes: (1) You are sampling from a finite population of size 150.
As long as the sample size (here $n=10)$ is less than 10% of the population size, these formulas for sampling from essentially infinite populations should give useful results.
(2) These methods assume that the population values are approximately normally distributed. These methods would not work well if you had a few locations that are hugely different from
any of the others.
(3) Your idea of doing some sort of stratified sampling so several
provinces are represented or that some observations are from urban and some are from rural location might be useful. That would depend on whether there are large differences among provinces or between rural or urban locations. Stratified sampling would make it somewhat more difficult to make a confidence interval.
(4) Here, because I simulated the whole population, we can find the exact population mean and standard deviation and we know that the data are normal. In most actual applications this information would not necessarily be known.
(5) If you have some data for all 100+ scores, you might try the t
test` on a sample of a dozen or so locations to how well it workd in your application.
Best Answer
Okay.
With a million in the population?
To a first approximation, it doesn't matter enough to be worth worrying aboutActually, turns out in this case it does. I'll do it both without replacement and with. With replacement is simpler, and I do it first.
Don't need it. The sample size will be large enough that with the other assumptions, only really strongly non-normal distributions will have any impact.
Can we assume independence (apart from the effect of sampling without replacement)? e.g. sampling completely at random? I'll take it that we can.
$\mu = 0.90$
$\sigma = 1.32$
Want 'to be 99% confident that the sample mean is within 1% of the population mean'.
i.e. Find $n$ such that $P(|\bar{x}-\mu| < .01\mu) = 0.99$
$\bar{x}-\mu \sim N(0, \frac{\sigma^2}{n})$
99% of a normal distribution is within 2.576 s.d.'s of the population mean (this figure is gettable from normal tables, or using a function in a program. I used R) ` Thus I need $2.576 \times \sigma/\sqrt{n} < 0.01 \mu = 0.009$
Hence $2.576^2 \sigma^2/n < 0.009^2$
Hence $2.576^2 \sigma^2 < n \times 0.009^2$
Or $n > (2.576 \times 1.32/0.009)^2 = 142742.9$
So if $n$ is about 142700, (the means and sd's and normal table values were only accurate to about the same number of figures - only the first 3-4 digits will be meaningful) then the required probability statement should hold.
If we allow for the 'without replacement' the sample size would reduce about 14% percent (google for finite population correction to the variance); other factors are likely to affect you by more than a couple of percent (like not having perfectly random sampling, for one example)
Let's look at the without replacement case using the finite population correction now.
The finite population correction multiplies the variance by a factor $f = \frac{N-n}{N-1} = 1-\frac{n-1}{N-1}$.
Some people approximate this by $1 -\, n/N$, which is easily accurate enough with the large numbers for $n$ and $N$ involved here. However, I'll try to do the first version there.
$2.576^2 \sigma^2 (N-n)/(N-1) < n \times 0.009^2$
$(2.576\sigma/0.009)^2 /(N-1) < n/(N-n) $
$(2.576\sigma/0.009)^2 /(N-1) < 1/[N/n\,\,\, -1] $
$142743 \times 1000000/1142742 < n$
So (if I did that right), $n > 124912.7$
Or to the accuracy in the normal value, $n$ should be about $124900$.
(assuming the mean and s.d. are actually accurate to at least 4 figures, too)
Calculation check:
Interval half-width =
$(2.576\times 1.32/\sqrt{124900})\sqrt{(1000000-124900)/999999}$
$= 0.00900$