Solved – Can a two-sample t-test be used with data that doesn’t follow a normal distribution

central limit theoremnormal distributiont-testvariance

One of the assumptions for t-tests is that the data must follow a normal distribution.

However, due to the Central Limit Theorem (and this thread): "if the sample is large enough you can use t-test (with unequal variances)". I'm trying to sort out what this means for my case. I think my sample should be large-enough, but how to confirm it?

A Levene's test showed that the two samples don't have an equal variance, hence I plan to use Welch's test (the unequal variance version of the t-test). I've also ran the Shapiro-Wilk test to confirm that one of my two samples doesn't, in fact, follow a normal distribution.


Additional information

I need to run the tests for a few different cases, but to keep things short I'm detailing only two of them.

Sample sizes are 19 and 15, respectively for group1 and group2 (this happens on both the examples: Case1 and Case2).

Results of Shapiro-Wilk's test for normality

Case1
sample | p_value   | w     | Result
group1 | 0.104     | 0.918 | Normal
group2 | 0.027     | 0.863 | Not Normal (p<0.05)

Case2
sample | p_value   | w     | Result
group1 | 2.054e-05 | 0.663 | Not Normal (p<0.05)
group2 | 0.006     | 0.814 | Not Normal (p<0.05)

Results of Levene's test for equality of variances

Case1
p_value | w    | Result
0.154   |2.128 | Equal Variance

Case2
p_value | w    | Result
0.0251  |5.521 | Unequal Variance (p<0.05)

Result of the one-tailed (Welch) t-test (H1: group1>group2)

Case1
t_statistic | p_value | Result
3.073       | 0.002   | Significant (p<0.05)

Case2
t_statistic | p_value | Result
2.475       | 0.012   | Significant (p<0.05)

Best Answer

One simple way to convince yourself that the CLT applies or does not apply is with some simulations.

Here is some R code:

testfun <- function(n1=19, n2=15) {
    x <- rexp(n1, 1/3)
    y <- rt(n1, 5) + 3
    t.test(x,y)$p.value
}

out <- replicate(10000, testfun(n1=19, n2=15))
hist(out)
abline(v=0.05, col='red')
mean( out <= 0.05 )

This code defines a function (testfun) that generates data from 2 different distributions (t with 5 df and exponential ) that have the same mean (3 in this case) and runs the built in t.test function and returns the p-value.

The replicate then runs this 10,000 times and we look at the results. The histogram should be close to uniform, but in this case we see an excess of values close to 0. The mean function calculates the type I error rate (since the null is true in the simulations), for my run this was a little of 7% when it should be 5%. Is that far enough to cause you concern? or are you happy with that as a "close enough" approximation?

Of course you should probably run this generating data from distributions that are more reasonable for your study, it may be that for something less skewed than the exponential that the differences would be small enough to not worry about.