Solved – Standard Error used in Hypothesis Testing and Confidence Interval construction

confidence intervalhypothesis testingstandard errorstatistical significance

In the excellent Practical Statistics for Medical Research Douglas Altman writes in page 235:
"Because the standard error used for calculating the confidence interval differs from that used in hypothesis testing it can occasionally happen[…] that the confidence interval excludes the value specified under the null hypothesis when the hypothesis gives a non-significant result"

Could someone comment why the SE's are different in hypothesis testing than in confidence intervals construction and which formulas are appropriate in each case?

Best Answer

One simple example of this is doing a one sample test of proportions using the normal approximation. When doing a test of significance we have a null hypothesis that the proportion is a specific value, so we use that number in the standard error formula (since we do not know the true proportion and assume the null is true till proven otherwise). But when doing a confidence interval we do not assume anything about the proportion and generally use the proportion estimated from the sample in the standard error formula. Occassionally this can make the 2 disagree.

Similarly when comparing 2 proportions the standard error in hypothesis testing uses a composite proportion assuming that the 2 proportions are equal, but the confidence interval does not assume equality and combines the 2 proportions in a different way.

Related Solutions

Solved – A psychology journal banned p-values and confidence intervals; is it indeed wise to stop using them

The first sentence of the current 2015 editorial to which the OP links, reads:

The Basic and Applied Social Psychology (BASP) 2014 Editorial *emphasized* that the null hypothesis significance testing procedure (NHSTP) is invalid...

(my emphasis)

In other words, for the editors it is an already proven scientific fact that "null hypothesis significance testing" is invalid, and the 2014 editorial only emphasized so, while the current 2015 editorial just implements this fact.

The misuse (even maliciously so) of NHSTP is indeed well discussed and documented. And it is not unheard of in human history that "things get banned" because it has been found that after all said and done, they were misused more than put to good use (but shouldn't we statistically test that?). It can be a "second-best" solution, to cut what on average (inferential statistics) has come to losses, rather than gains, and so we predict (inferential statistics) that it will be detrimental also in the future.

But the zeal revealed behind the wording of the above first sentence, makes this look -exactly, as a zealot approach rather than a cool-headed decision to cut the hand that tends to steal rather than offer. If one reads the one-year older editorial mentioned in the above quote (DOI:10.1080/01973533.2014.865505), one will see that this is only part of a re-hauling of the Journal's policies by a new Editor.

Scrolling down the editorial, they write

...On the contrary, we believe that the p<.05 bar is too easy to pass and sometimes serves as an excuse for lower quality research.

So it appears that their conclusion related to their discipline is that null-hypotheses are rejected "too-often", and so alleged findings may acquire spurious statistical significance. This is not the same argument as the "invalid" dictum in the first sentence.

So, to answer to the question, it is obvious that for the editors of the journal, their decision is not only wise but already late in being implemented: they appear to think that they cut out what part of statistics has become harmful, keeping the beneficial parts -they don't seem to believe that there is anything here that needs replacing with something "equivalent".

Epistemologically, this is an instance where scholars of a social science partially retract back from an attempt to make their discipline more objective in its methods and results by using quantitative methods, because they have arrived at the conclusion (how?) that, in the end, the attempt created "more bad than good". I would say that this is a very important matter, in principle possible to have happened, and one that would require years of work to demonstrate it "beyond reasonable doubt" and really help your discipline. But just one or two editorials and papers published will most probably (inferential statistics) just ignite a civil war.

The final sentence of the 2015 editorial reads:

We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking. The NHSTP has dominated psychology for decades; we hope that by instituting the first NHSTP ban, we demonstrate that psychology does not need the crutch of the NHSTP, and that other journals follow suit.

Solved – What does confidence interval and p values mean w.r.t linear regression

It's hard to step in when you had people commenting of the caliber of the names above, but I did tried to understand this the silly way... Using the power of [R] to simulate mathematical problems. So I hope it sheds some light into what these uncertainty quantifications attached to the regression parameters mean - that was the question...

So from the perspective of the frequentist there is this Platonic world of absolute representation of every single individual - the population, and we are looking at the shadows on the wall of the cave - the sample. We know that no matter how much we try we'll be off, but we want to have an idea of how far we'll be from the truth.

We can play god, and pretend to create the population, where everything is perfect, and the parameters governing the relationships between variables are glimmering gold. Let's do that by establishing that the variable $x$ will be related to the variable $y$ through the equation, $y = 10 + 0.4\,x$. We define the x's as x = seq(from = 0.0001, to = 100, by = 0.0001 (that is $1 \,million$ observations). The y's will therefore be calculated as y <- 0.4 * x + 10. We can combine these values in a data.frame: population = data.frame(x, y).

From this population we will take $100$ samples. For each sample, we will randomly select $100$ rows of data from the dataset. Let's define the function for sampling rows:

sam <- function(){
  s <- population[sample(nrow(population),100),] 
  s$y <- s$y + rnorm(100, 0, 10)
  s
}

Notice that we are no longer in paradise - now we have noise (rnorm).

And we are going to collect both the intercepts and the slopes (I'll call them betas) of the OLS linear regression run on each one of these $100$ samples. Let's write some lines of code for this:

betas <- 0
intercepts <- 0
for(i in 1:100){
  s <- sam()
  fit <- lm(y ~ x, data = s)
  betas[i] <- coef(fit)[2]
  intercepts[i] <- coef(fit)[1]
}

And combine both into a new data.frame: reg_lines <- data.frame(intercepts, betas). As expected given the normal randomness of the noise the histogram of the slopes will be gaussian looking:

And if we plot all the regression lines that we fitted in each single one of the $100$ samples from the rows in the population we'll see how any single one is just an approximation, because they do oscillate between a maximum and a minimum in both intercept and slope. This is what they look like:

But we do live in the real world, and what we have is just a sample... Just one of those multicolored lines, through which we are trying to estimate the truth (i.e. intercept of $10$ and slope of $0.4$). Let's conjure this sample: S <- population[sample(nrow(population), 100),]; S$y <- S$y + rnorm(100, 0, 10), and its OLS regression line: fit <- lm(y ~ x, data = S).

Since we are playing god, let's plot our biased sample (dark blue dots of the with dark blue fitted regression line) together with the true line in solid green, and the maximum and minimum combinations of intercepts and slopes we got in our simulation (dashed red lines), giving us an idea of how off we could possibly be from the true line):

Let's quantify this possible error using a Wald interval for the slopes to generate the 5% confidence interval:

coef(fit)[2] + c(-1,1) * 1.96 * summary(fit)$coefficients[4], where summary(fit)$coefficients[4] is the calculated standard error of the estimated slope. This gives us; 0.2836088 to 0.4311044 (remember the "true" value $0.4$).

And for the intercept: coef(fit)[1] + c(-1,1) * 1.96 * summary(fit)$coefficients[3], which give us: 9.968347 to 17.640500.

Finally, let's compare these values by those generated by [R] when we type:

confint(fit)
(Intercept) 9.9204599 17.688387
x           0.2826881  0.432025

Pretty close...

OK, so this is a very intuitive approach at seeing what the confidence intervals are trying to answer. And as for the $p$-values, you can read how they are generated here. In general, the text notes that if the regression coefficient in the population is $0$ ($H_o: \beta = 0$) the $t$-statistic will be:

$$t = \frac{\hat\beta_{yx}-\beta{yz}}{SE_{\hat\beta}}= \frac{\hat\beta_{yx}}{SE_{\hat\beta}}$$.

The $SE_{\hat\beta}$ (which we used in the Wald interval) can be calculated in different ways, although the formula given in the text quoted is:

$SE_{\hat\beta}=\sqrt{\frac{var(e)}{var(x) \, (N-2)}}$. If we calculate this manually:

The variance of the independent variable for our sample is: var_x <- (sd(S$x))^2 = 719.0691. The variance for the errors is: var_e <- sum((residuals(fit)- mean(residuals(fit)))^2)/(nrow(S)-1) = 99.76605. And N - 2 = 98 (we lose one $df$ both for the intercept and the slope). Hence, $SE_{\hat\beta} = \small 0.03762643$ ((SE <- sqrt(var_e/(var_x * (N - 2))))). Which happily coincides with that obtained for the slope of x by [R]:

summary(fit)
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.80442    1.95718   7.053 2.49e-10 ***
x            0.35736    0.03763   9.497 1.49e-15 ***

So $t=\frac{\hat\beta_{yx}}{SE_{\hat\beta}}= \small 0.3573566 / 0.03762643=9.497488 $ ((t <- coef(fit)[2]/SE)). What else? Right, the $p$-value... pt(9.497488, 98, lower.tail = F) = 7.460233e-16 ~ 0.

Best Answer

Related Solutions

Solved – A psychology journal banned p-values and confidence intervals; is it indeed wise to stop using them

Solved – What does confidence interval and p values mean w.r.t linear regression

Related Question