Solved – Dimensions Of The Covariance Matrix

dimensionality reductioneigenvaluespcaunsupervised learning

I know that PCA can be obtained by eigendecomposition of the covariance matrix, and the covariance matrix $S$ is obtained by the equation: $S = X^TX $, where $X$ is the centered data matrix.

But I am a bit confused about the dimensions of the covariance matrix.

In some resources, they define the data matrix as: $X_{n \times d}$ where n is the number of samples and d is the dimension. In other resources, it's the opposite: $X_{d \times n}$. And this definitely yields covariance matrices with different dimensions, and also eigenvectors of different dimensions.

I am not sure what am I getting wrong, but I think I am missing something important here.

Best Answer

When $X$ is $n\times d$, the scatter matrix (the scaled covariance) is $S=X^TX$. When it is $d\times n$, $S=XX^T$. And, in the latter case, the rows of $X$ are mean-centered as opposed to the former.

The logic is always to calculate $$\sum_{i=1}^n x_ix_i^T$$ where $x_i$ is one data sample of dimension $d\times 1$.

Covariance matrix is typically estimated as $S/n$ or $S/(n-1)$, since it's just a scalar, in PCA it doesn't matter.