MATLAB: Hypothesis testing in matlab

hypothesis testingpaired tteststatisticsttestttest2

I want to do a t-test in matlab. I'm looking at different models and seeing how storms change in historical and future scenarios. I have two sets of data (each the same length), I want to compare to see whether the two sets of data have changed across historical and future models. Both sets of data are the same length, and the means of the two sets of data are very close to one another. ttest and ttest2 both give significantly differet results, how do you know which test to use?

Thanks in advance!

Best Answer

A common mistake in statistics is to work backwards by first selecting a test, or worse, several tests, and then selecting which test to use. This practice leads to p-hacking which some people call "bad science" but in reality, it's not science at all and is a breach of the scientific method. This has recently led more than 800 academically affiliated statisticians and scientists to sign a commentary in Nature that suggests ending significance testing all together!

ttest() vs ttest2()

" ttest and ttest2 both give significantly differet results, how do you know which test to use?"

First, know what each test is testing by looking at the null hypotheses. Fortunately, Matlab has done a decent job at directly stating the null hypotheses in the documentation for each function.

For ttest(x,y)

h = ttest(x,y) returns a test decision for the null hypothesis that the data in 
x – y comes from a normal distribution with mean equal to zero and unknown 
variance, using the paired-sample t-test.

A "paired-sample t-test" means that x(n) is associated in some way with y(n). For example, x could be the test results of 100 students on day-1 of a course and y could be the test results of the same exact 100 students in the same order taking the same exam on the last day of a course. Another example: x could be duration of balancing on the left leg while y could be the duration of balancing on the right leg for 100 people with unilateral vestibular hypofunction. In both examples x(n) is associated with y(n).

For ttest2(x,y)

h = ttest2(x,y) returns a test decision for the null hypothesis that the data in 
vectors x and y comes from independent random samples from normal distributions 
with equal means and equal but unknown variances, using the two-sample t-test. 
The alternative hypothesis is that the data in x and y comes from populations
with unequal means. The result h is 1 if the test rejects the null hypothesis at  
the '5%' significance level, and 0 otherwise.

Here "independent random samples" is key. Unlike paired t-test, x(n) has no more of a relationship with y(n) than y(n+1) or y(n-1). For example, x could be the height of 100 fully grown maple trees at sea level while y could be the height of 100 fully grown maple trees at an elevation of 5000 feet. x(n) and y(n) are both maple trees but they are different trees and have no other relationship.

Going back to your storm data, it doesn't matter that your two vectors of data are the same length. What matters is whether the historic data are related to the future data. Since storms come and go (except on Jupiter) it's unlikely that the historicData(n) is related to the futurePrediction(n) so your samples are independent (but you must make that decision). That would point to using ttest2().

Assumptions and nonparametric tests

Lastly, don't ignore the assumptions. In both cases, it is expected that your data are normally distributed or at least close to normal. The t-test will give you a result under any distribution but it's up to you as a researcher to trust that result based on satisfying the assumptions.

[addendum]

Star Strider mentioned a good point that I'd like to expand upon for completeness. If the data does not meet the assumptions for a ttest (mainly, if it's distribution is not normal), you can use a nonparametric test that does not rely on underlying distributions (these tests still have other assumptions!).

For an independent sample ttest **with equal variances**,

Mann-Whitney U-Test (in matlab: ranksum(x,y))

For an independent sample ttest **with unequal variances**

Kolmogorov-smirnov test (in matlab: kstest2(x,y) for 2-sample and ktest(x) for 1 sample

For a paired ttest

Wilcoxon Signed-Rank test (in matlab: ranksum(x,y))

Best Answer

Related Solutions

MATLAB: Can I use a CDF with parameters based on the data set in the KSTEST function in the Statistics Toolbox

MATLAB: Matlab Ttest2 changing significance level

Related Question