Solved – Combination of Data from Two Normal Samples not Normal

goodness of fithypothesis testingsample-size

I have two sets of data that hypothesis tests have shown to be normal and from the same distribution. I'm using MATLAB and for the way they give p-values, higher p-values suggest a better goodness-of-fit, as I understand it. I want to combine the two data sets to get the parameters for the combined data set. However, when I run a hypothesis test on the combined data set, hypothesis tests either reject the null assumption of Normality or come very close as seen by the test statistics. I've been looking for an explanation but have no idea where to start. Could anyone shed some light into this?

Best Answer

If you're talking about a test of the null hypothesis that your sample data come from an exactly normal distribution, you'll probably want to see this question: Is normality testing 'essentially useless'?

Most (if not all) real data is not normally distributed exactly. You can estimate the skewness and kurtosis to quantify how different your data's distribution is from a normal distribution. Even small amounts of skew or excess kurtosis become easier to estimate precisely as data aggregates. By combining your two samples, you're improving the precision of your estimates. It seems you have enough data to estimate your distribution's differences from a normal distribution so precisely that, with your normality tests' $p$ values, you can say there's "very close" to or less than $\alpha$ (typically 5%) chance that, from a truly normal distribution, you would randomly sample a dataset that is as large and at least as different from a normal distribution as yours is. Since that's the null hypothesis in a nutshell, and it seems very unlikely to be true, you're probably right to reject it, but this is somewhat of a foregone conclusion.

Whether your data are too different from a normal distribution for your intended purposes is another question. The hypothesis tests you're using probably can't answer it for you by themselves: their outcomes will depend on your sample size, but won't take into account the sensitivity of whatever other analyses you have planned, among other reasons. You would probably get a better idea of how important this issue is by studying the sensitivity of your planned analyses to violations of the normality assumption; some really aren't very sensitive to the usual kinds of minor violations that are present in approximately normal data. You might get better sense of how non-normal your data are from a Q-Q plot, as @Glen_b suggested; knowing just how much non-normality is too much can still be difficult to tell from these plots though.

If you have time, or if your data are particularly problematic, you might want to look into robust alternatives to any other analyses you plan to run. This might even be one of the shorter routes to avoiding validity problems while ignoring the idiosyncrasies of your distribution, but bear in mind that those idiosyncrasies are sometimes interesting in their own right! Finally, if you're particularly serious about minding the sensitivity of any subsequent analyses to data that aren't from exactly normal distributions, you might want to consider doing some simulation testing of those analyses too.

(Full disclosure: for this answer, I borrowed and modified some text from a previous answer of mine.)

Edit: The null hypothesis of the two-sample Kolmogorov-Smirnov test states that the two samples are drawn from populations with the exact same distribution (or are both drawn from the same population, which is roughly equivalent). This would mean equal distributional parameters too...but to test normality (as you say you have), one of the distributions (your "reference distribution") would have to be a normal distribution. You'd need to run two such tests to compare each of two distributions to a normal distribution, or to each other and then one of them to a normal distribution. Testing twice inflates your overall false rejection rate, so you would want to adjust for multiple tests before rejecting those nulls (which you haven't anyway, so this is somewhat moot). Wikipedia mentions a multivariate version of this test, but it's relatively new, so I doubt you would've performed this version of the test unknowingly.

Related Question