Assuming $X \sim \operatorname{MVN}(\boldsymbol{\mu}, \Sigma)$, ie it follows a multivariate normal distribution with known mean $\boldsymbol{\mu}$ and variance $\Sigma$, $(\mathbf{X} - \boldsymbol{\mu}) \Sigma^{-1}(\mathbf{X} - \boldsymbol{\mu})$ (which is the Mahalanobis distance squared), follows a Chi squared distribution with $p$ degrees of freedom, where $p$ is the number of dimensions in $X$.
However, if we have to estimate $\Sigma$ from $n$ samples, we denote it as $\mathbf{S}$, which follows a Wishart distribution with $n$ degrees of freedom. Then, $(\mathbf{X} - \boldsymbol{\mu}) S^{-1}(\mathbf{X} - \boldsymbol{\mu})$ follows the Hotelling $T^2$ distribution with $p$ and $n$ degrees of freedom.
So while the statistics have similar forms, they are used in different contexts, although they will be asymptotically similar.
Note that this is directly analogous to in the univariate case, where $X \sim N(\mu, \sigma^2)$. In this case, with known $\sigma^2$, $z = \frac{X-\mu}{\sigma / \sqrt{n}} \sim N(0, 1)$, ie the standard normal distribution. However, if we have to estimate $\sigma^2$ from data, the scaled estimator follows a chi squared distribution, ie $\frac{(n-1)s^2}{\sigma^2} \sim \chi^2(n-1)$. Then, $t = \frac{X-\mu}{s / \sqrt{n}} \sim t(n-1)$.
So, both the Student $t$ and the Hotelling $t$ represent "more uncertain" versions (ie sampling distributions) of their respective "certain" versions. However, both asymptotically approach their "certain" versions, namely $\chi^2(p)$ and and $N(0, 1)$ respectively, as $n$, the number of samples, approaches infinity.
Roughly speaking, a test or estimator is called 'robust' if it still works reasonably well, even if some assumptions required for its theoretical development are not met in practice. Comments:
If you need to do one-factor ("one-way") ANOVA for data with different variances at each level of the factor, then it is best to use some variant of one-way ANOVA such as oneway.test
in R that does not require equal variances.
As you say, a 'pooled' t test or simple one-way ANOVA where the numbers of replications per factor differ greatly, may be problematic if variances also differ among levels of the factor.
Some texts seem to say 2-sample t test and one-way ANOVA are OK for non-normal data whenever there are more than 30 replications per group. But this may not be true if data within groups are highly skewed.
If levels of 2-sample t or one-factor ANOVA are far from
normal, but differences between groups are mainly a 'shift'
of location (with little change in shape or variance) then
it may be best to use Welch t test or Kruskal-Wallis nonparametric test instead of t or ANOVA, respectively.
Note: I could show an example to illustrate, if you could say
what test is of particular interest and what assumption
you feel unsure of.
Best Answer
Since both your $X$ and $Y$ are rank deficient, you cannot directly compute the Hotteling $T$ because the $W$-matrix is singular.
So, you have to replace your $X$ and $Y$ by their reduced rank approximations, which I'll denote $X^k$ and $Y^k$.
where
$$X^k=(X-\bar{X})V[(X-\bar{X})/\sqrt{n_x-1}]$$
where $\bar{X}=n_{x}^{-1}\sum_{i=1}^{n_x} X_i$ and $V[Z]$ is a matrix whose column contain the left singular vector of $Z$ (ditto for $Y$).
Then, based on the reduced rank approximation of your data, you can compute a reduced rank Hotteling $T$. Finally, replace $p$ in all formulae by the rank of $X^k$ (in your case 8).
Now, since you suspect your $X$ and $Y$ to be contaminated by outliers, you might as well also replace the algorithm used to obtain $X^k$ by a robust counterpart (one such algorithm is ROBPCA, You'll find a good R implementation of ROPBCA in the function PcaHubert here. You'll find more details on page 26 of this note).