Hypothesis Testing – Does Testing for Assumptions Affect Type I Error?

assumptionsbonferronihypothesis testingmultiple-comparisonstype-i-and-ii-errors

I just performed simple simulation. Made two "populations" with different means and the same variance. Since I prepared them I know that they: are normal, differs in location and both have the same scale. Then I wrote simple script in R which 1000 times draws samples from the both populations, test them for normality (Shapiro-Wilk), for equal variances (F) and for difference (t). If the assumptions of t-test are met I run t-test.

Well, in fact I run all of them one by one, but combine the results with logical AND operator. So, if IsNormal(A) AND IsNormal(B) AND VarsAreEqual(A, B) AND MeansAreDifferent(A, B), then I count the result. Finally I divide number of TRUE values and divide it by the number of iterations. So, every test of assumption may fail. Shapiro-Wilk may falsely (they ARE normal) reject H0 as well as F test (they HAVE equal variances). And I don't get values near 0.05!

So, my question is: is this an example of the "multiple testing" phenomena (like multiple comparisons)? If so, what about testing for assumptions? Such testing is crucial as if the assumptions are not met, the quantiles of sampling distribution of a given test may be completely different than expected and so given results may be completely useless. But testing for the assumptions clearly changes the alpha. Am I correct?

Perhaps we should use corrections for alpha, like Bonferroni?

data<- data.frame(populA=rnorm(1000, mean=1), populB=rnorm(1000, mean=3))

iternum <- 1000
alpha   <- 0.05
nsample <- 50
results <- array(FALSE, iternum);

for(i in 1:iternum) {
    sampA <- sample(data$populA, 30);
    sampB <- sample(data$populB, 30);

    result_t <- t.test(sampA, sampB)$p.value;
    result_F <- var.test(sampA, sampB)$p.value;
    result_normA <- shapiro.test(sampA)$p.value;
    result_normB <- shapiro.test(sampB)$p.value;

    results[i] <- (result_normA >= alpha) & (result_normB >= alpha) & 
                  (result_F >= alpha) & (result_t < alpha)
}
> 1-(mean(results))

[1] 0.158

Best Answer

Generally speaking, the answer is yes, both type I and type II error rates are impacted by choosing tests on the basis of tests of assumptions.

This is pretty well established with testing of equality of variance (for which several papers point it out), and testing normality. It should be expected that it will be the case in general.

The advice is usually along the lines of "if you can't make the assumption without testing, better to simply act as if the assumption doesn't hold".

So, for example, if you're trying to decide between the equal-variance and Welch-type t-tests, by default use the Welch test (though under equal sample size it is robust to violations of that assumption).

Similarly, in moderately-small$^*$ samples, you may be better off using a permutation test for location by default than testing for normality and then using a t-test if you fail to reject (in large samples, the t-test is usually level-robust enough that it's not likely to be that big an issue in most cases, if the sample is also large enough that you're not concerned about impact on power). Alternatively, the Wilcoxon-Mann-Whitney has very good power compared to the t-test at the normal, and would often be a very viable alternative.

[If for some reason you need to test it would be best to be aware of the extent to which the significance level and power of the tests may be affected under either arm of any resulting choice the test of assumptions leads you to. This will depend on the particular circumstances; for example simulation can be used to help investigate the behavior in similar situations.] * (but not very small, since the discreteness of the test statistic will limit the available significance levels too much; specifically, at very small sample sizes the smallest possible significance level may be impractically large)

A reference (with a link to more) on testing heteroskedasticity when choosing between equal-variance-t vs Welch-t location tests is here.

I also have one for the case of testing normality before choosing between the t test and the Wilcoxon-Mann-Whitney test (ref [3] here).