Solved – Alternative tests for Hotelling’s two-sample T-test

assumptionshotelling-t2robust

I have a bunch of vectors from two groups, $X$ and $Y$, and each vector in either $X$ or $Y$ groups has $m$ elements. Now I have $X_{1},\ldots,X_{8}$ and $Y_{1}, \ldots, Y_{8}$ in each group, and would like to compare/test the difference between the two groups. From the Wikipedia page of Hotelling's $T$-squared distribution, the assumption for the samples are two independent multivariate normal distributions with the same mean and covariance.

My question is that, when some of the assumptions of Hotelling's $T$ test are not satisified, shall I continue to use it or there are alternative more robust approaches to choose? For example, I only have (in each group) 8 samples, even though the number of elements in $X_{i}$ ($Y_{i}$) is large (say $50$), and they're not necessarily normal.

Best Answer

Since both your $X$ and $Y$ are rank deficient, you cannot directly compute the Hotteling $T$ because the $W$-matrix is singular.

So, you have to replace your $X$ and $Y$ by their reduced rank approximations, which I'll denote $X^k$ and $Y^k$.

where

$$X^k=(X-\bar{X})V[(X-\bar{X})/\sqrt{n_x-1}]$$

where $\bar{X}=n_{x}^{-1}\sum_{i=1}^{n_x} X_i$ and $V[Z]$ is a matrix whose column contain the left singular vector of $Z$ (ditto for $Y$).

Then, based on the reduced rank approximation of your data, you can compute a reduced rank Hotteling $T$. Finally, replace $p$ in all formulae by the rank of $X^k$ (in your case 8).

Now, since you suspect your $X$ and $Y$ to be contaminated by outliers, you might as well also replace the algorithm used to obtain $X^k$ by a robust counterpart (one such algorithm is ROBPCA, You'll find a good R implementation of ROPBCA in the function PcaHubert here. You'll find more details on page 26 of this note).

Related Question