Solved – How to MANOVA report a non-significant p-value while LDA results in perfect separation of two groups

discriminant analysismanova

I am new to statistics and currently got a dataset which contains $80$ dependent variables and $1$ independent variable with $2$ groups. MANOVA reports a $p$-value of $> 0.6$ on this dataset. But when I use linear discriminant analysis (LDA), two groups can be separated perfectly (almost $100\%$).

Is this result possible?

What I understand for MANOVA is that larger $p$-value means less difference between two groups. Then how can the groups be separated perfectly if there is no difference between them?

Best Answer

This is an excellent question because it touches on so many important concepts. The short answer is: Yes, this is possible, and can happen if your sample size is low.


Let us make the apparent contradiction a bit more precise. The MANOVA tests whether your data could have been observed if in reality there were no difference between the two groups (that is the null hypothesis). Your $p$-value $p=0.6$ is telling you that the answer is: yes, it easily could. At the same time, LDA results in a [almost] perfect separation between the two groups. So is it possible that in reality there is no difference between the groups but the actual data appears to be perfectly separable?

We can use a simple simulation to check. For various values of the sample size $N$ I generated random data from the standard normal distribution in the $80$-dimensional space $\mathbb R^{80}$ and assigned one half of the points to group $\#1$ and another half to group $\#2$. Both groups are therefore sampled from identical distributions, with true means at zero. For any value of $N$, MANOVA usually reports a high non-significant $p$-value, as expected. But let us look at LDA.

First of all, note that if $N<80$ then the groups can always be separated perfectly. Think e.g. of two points in 2D or of three points in 3D: however you assign them to two groups, one can always linearly separate them. On the other hand, if $N$ is huge, e.g. $N=1\:000\:000$, then it is intuitively clear that the two massive clouds of points (group $\#1$ and group $\#2$) will be entirely overlapping, resulting in no separation. But it might be surprising to see how slow the apparent separation is decreasing with increasing $N$:

LDA overfitting

The blue line shows mean values of classification accuracy for $N$ between $100$ and $1000$ (mean over $100$ repeated simulations), and the shading shows two standard deviations. At $N=100$ separation is almost perfect. At $N=200$ separation is around $80\%$. At $N=500$ it is still over $65\%$. One needs to get to $N>100\:000$ to get below $51\%$.

This effect is known as overfitting. Your $80$-dimensional space is large. LDA is looking for an axis with the best separability between groups, and when the sample size is not big enough, it can usually find some axis that by chance happens to yield good separability. That is why one should better use cross-validation to assess the performance of a classifier: if we used it here, then cross-validated classification accuracy would always be around $50\%$, as it should be.

Technically, overfitting happens because within-class covariance matrix cannot be reliably estimated with small $N$ (and so the sample covariance matrix in the example above will have some very small eigenvalues, instead of them all being equal). You might be interested in reading more in my answers here:

Related Question