Solved – Maximum number of principal components in PCA. Is sklearn wrong

dimensionality reductioneigenvaluespcapythonscikit learn

Recently I've been interested in applying PCA to a dataset I have and I wanted to develop a deep understanding of what I would actually be doing when I implement it.

Today I encountered two confronting answers to the question of what is the maximum number of principal components. The two answers are these ones:

maxn_pc = min(n_samples, n_features). Supported by sklearn's documentation

Or this formula. Supported by this flawlessly looking argument.

if n_samples <= n_features:
    maxn_pc = n_samples - 1
else:
    maxn_pc = n_features

Do any of you know what is the meaning of that extra component that sklearn's PCA is offering?

Best Answer

Per @amoeba's comments:

If the number of samples $n$ is less than or equal to the number of features, the $n$-th PC will be constant zero (eigenvalue = 0). This is what sklearn will presumably return. The number of non-trivial PCs is $n−1$ as per the linked answer.

Best Answer

Related Solutions

Solved – How exactly is sparse PCA better than PCA

Solved – How to choose the 10 variables that explain the most variation in a wealth index

Related Question