Solved – a practical explanation of affine equivariance and why does it matter for a covariance estimator

covariancedistancerobust

I am a structural engineer out of my depth with some statistical processes. I am needing a robust estimator for covariance and there are many options out there. MCD, MVE, OGK, etc.

Some of them are more computationally efficient which is important to me. The OGK estimator is particularly efficient but it does not have "affine equivariance". Can someone explain to me very practically what that means and maybe a practical situation to illustrate it?

Thanks!

Best Answer

I will first recall the property formally:

Given an $n$ by $p$, $n>p$ data matrix $X$, an affine equivariant estimator of location and scatter $(m(X), S(X))$ is one for which:

$$(0)\quad m(A X)=A m(X)$$ $$(1)\quad S(A X)=A^\top S(X)A$$

for any $p$ by $p$ non singular matrix $A$.


Consider a situation where one would use $(m(X),S(X))$ to compute the statistical distance between a point $x$ and $m(X)$, the center of $X$, in the metric $S(X)$ (assuming $S(X)$ is invertible):

$$d(x,m(X), S(X))=\sqrt{(x-m(X))^{\top}S^{-1}(X)(x-m(X))}$$

Affine equivariance of $(m(X),S(X))$ is equivalent ($\Leftrightarrow$) to affine invariance of $d(x,m(X), S(X))$.

Affine invariance of $d(x,m(X), S(X))$ means that this measure (of outlyingness of $x$ wrt to $X$) will not be affected by the scale and orientation (correlation structure) of the columns of $X$.


I many application equi/invariance is extremely helpful. The alternative is that one has to run the analysis obtained using the non equivariant procedure (in this case the OGK) on many transformed versions of $X$: $\{X',X'',\ldots\}$ --each obtained by applying random matrices $\{A', A'',\ldots\}$ to the original data matrix $X$-- in the hope of assessing the sensitivity of the analysis (in your case the observations flagged as outliers) to the coordinate system in which you measure the data $X$.

I stress that this sort of sensitivity check is not restricted to robust statistics. For example, PCA analysis is not scale equivariant. When performing PCA, it is prudent to run the PCA analysis on various rescaling of the data to assess the sensitivity of whatever results are found to the original scaling of the data. Likewise, Deep Neural Nets are not rotation equivariant and here too (at least in images and character recognition) it is common to re-run the DNN of rotated copies of the inputs to assess the sensitivity of the results to the orientation of the training data.

With equivariant procedures, these particular sensitivity checks are not necessary (for example, the statistical distances wrt to the FMCD estimates of location and scatter at $\{X', X'',\ldots\}$ would all always be identical).