Let's assume that we are estimating a parameter of the population (e.g., mean height or weight). We've gathered our sample and calculated confidence intervals around the parameter of interest. Are we, therefore, assuming that this parameter follows uniform distribution over the confidence interval? In simple terms, is the parameter equally probable to be anywhere within this confidence interval?
Confidence Interval – Is the Parameter of Interest Uniformly Distributed Over the Confidence Interval?
confidence intervalestimationuniform distribution
Related Solutions
If the bootstrapping procedure and the formation of the confidence interval were performed correctly, it means the same as any other confidence interval. From a frequentist perspective, a 95% CI implies that if the entire study were repeated identically ad infinitum, 95% of such confidence intervals formed in this manner will include the true value. Of course, in your study, or in any given individual study, the confidence interval either will include the true value or not, but you won't know which. To understand these ideas further, it may help you to read my answer here: Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?
Regarding your further questions, the 'true value' refers to the actual parameter of the relevant population. (Samples don't have parameters, they have statistics; e.g., the sample mean, $\bar x$, is a sample statistic, but the population mean, $\mu$, is a population parameter.) As to how we know this, in practice we don't. You are correct that we are relying on some assumptions--we always are. If those assumptions are correct, it can be proven that the properties hold. This was the point of Efron's work back in the late 1970's and early 1980's, but the math is difficult for most people to follow. For a somewhat mathematical explanation of the bootstrap, see @StasK's answer here: Explaining to laypeople why bootstrapping works . For a quick demonstration short of the math, consider the following simulation using R
:
# a function to perform bootstrapping
boot.mean.sampling.distribution = function(raw.data, B=1000){
# this function will take 1,000 (by default) bootsamples calculate the mean of
# each one, store it, & return the bootstrapped sampling distribution of the mean
boot.dist = vector(length=B) # this will store the means
N = length(raw.data) # this is the N from your data
for(i in 1:B){
boot.sample = sample(x=raw.data, size=N, replace=TRUE)
boot.dist[i] = mean(boot.sample)
}
boot.dist = sort(boot.dist)
return(boot.dist)
}
# simulate bootstrapped CI from a population w/ true mean = 0 on each pass through
# the loop, we will get a sample of data from the population, get the bootstrapped
# sampling distribution of the mean, & see if the population mean is included in the
# 95% confidence interval implied by that sampling distribution
set.seed(00) # this makes the simulation reproducible
includes = vector(length=1000) # this will store our results
for(i in 1:1000){
sim.data = rnorm(100, mean=0, sd=1)
boot.dist = boot.mean.sampling.distribution(raw.data=sim.data)
includes[i] = boot.dist[25]<0 & 0<boot.dist[976]
}
mean(includes) # this tells us the % of CIs that included the true mean
[1] 0.952
- While solving the example will we assume that the estimate of true population mean is the mean of 1 sample we have picked i.e. 49.7 Kg or will the estimate of true population mean be one of the values in the interval we eventually calculate (49.6342, 49.7658)?
There is not 'the' estimate but there are several possible estimates. A common estimate is the population mean, 49.7, which is also known as the maximum likelihood estimate (given the assumed population properties of a Gaussian distribution).
To indicate not just the maximum, but also that multiple other values can be equally good candidates for an estimate we use a range or interval (typically the more precise and large the sample, the smaller this range can be made).
A confidence interval is a range of potential estimates for which the confidence is high. For any of the values in the range, if it would be the true value, then our observed sample mean of 49.7 wouldn't be an unlikely outlier and would be within at least 90% of the observations.
- We say that probability that the sample picked at random from the population has mean ranging from (x - 1.645(sigma)/root(n) to (x + 1.645(sigma)/root(n) has probability of 90%. How does this translate to probability of picking up a sample which contains true population mean in the interval (x - 1.645(sigma)/root(n) to (x + 1.645(sigma)/root(n)?
The logic of a confidence interval is inversed. It relates to the probability of the data, given the predicted value (instead of the inverse 'the probability of the predicted value, given the data', which relates to a credibility interval).
The confidence interval can be seen as a range of population parameter values for which the data has a high probability (but as a fiducial probability based on p-values rather than a likelihood).
To construct a 90% confidence interval
For each hypothetical population parameter you compute a range within which 90% of the observed samples would fall*.
Following that you pick the hypothetical population parameters for which the observed sample is within their 90% range of observations.
Conditional on the true population parameter, in 90% of the time you will have observed a sample such that you picked that true population value as part of the confidence interval, in 10% of the time you will have observed a sample for which you compute an erroneous range.
See for instance the image from
The basic logic of constructing a confidence interval
and
The idea behind a 90% confidence interval containing 90% of the time the true population is the conditioning on the true parameter instead of the observed experimental sample see: Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?
* The computation of such 90% range can be done differently, and there is not a single 'the' confidence interval, instead there are many different ways to compute a confidence interval with different tradeoffs. See e.g. Asymmetric confidence intervals and What is a rigorous, mathematical way to obtain the shortest confidence interval given a confidence level?
Best Answer
Because you are asking about confidence intervals, I will assume these are frequentist interval estimates.
A 90% t CI, based on the sample mean $\bar X$ and the sample standard deviation $S$ is of the form $$\bar X \pm t^*S/\sqrt{n},$$ where $t^*$ cuts probability $0.05$ from the upper tail of Student's t distribution with $n - 1$ degrees of freedom.
The analogous probability statement is $$P\left(-t^* < \frac{\bar X - \mu}{S/\sqrt{n}} < t^*\right) = 0.9,$$
in which probability is not uniformly distributed in $(-t^*,t^*).$
Consequently, you cannot get a 45% CI just by making that interval half as long: $(-0.5t^*, 0.5t^*).$
In order to get a 45% CI, you would need to cut probability $0.275$ from each tail of $\mathsf{T}(\nu=99).$
Suppose you have the fictitious sample below, of size $n = 100$ from a normal population with unknown mean $\mu$ and $\sigma,$ Then a 90% CI for $\mu$ would be $(59.79, 62.16)$ and a 45% CI would be $(60.54, 61.40),$ as computed using R below.
An alternative way to get the 45% CI, would be to use the confidence interval from the
t.test
procedure in R.An incorrect interval $(60.38, 61.56)$ that simply makes
CI.90
half as long, would not be a true 45% CI for $\mu.$