Solved – What does Sparse PCA implementation in Python do

dimensionality reductionpcapythonscikit learnsparse

I am interested on using sparse PCA in python and I found the sklearn implementation. However, I think this python implementation solves a different problem than the original sparse pca algorithm proposed in this paper and implemented in the R package elasticnet. For example, consider the following example regarding the explained variance of the sparse PCs:

import numpy as np
from sklearn.datasets import load_boston
boston = load_boston()
from sklearn.decomposition import SparsePCA

x = boston.data # Load boston dataset
x = x[:, [0, 2, 4, 5, 6, 7, 10, 11, 12]] # select non-categorical variables
spca = SparsePCA(n_components=5, alpha=1e-3, ridge_alpha=1e-6, normalize_components=False) # Solve sparse pca
spca.fit(x)
t_spca = spca.transform(x)
p_spca = spca.components_.T

# Calculate the explained variance as explained in sparse pca original paper

t_spca_qr = np.linalg.qr(t_spca) # QR decomposition of modified PCs

q = t_spca_qr[0]
r = t_spca_qr[1]

# compute adjusted variance
variance = []
for i in range(5):
    variance.append(np.square(r[i][i]))

variance = np.array(variance)

# compute variance_ratio
total_variance_in_x = np.matrix.trace(np.cov(x.T)) # Variance in the original dataset
explained_variance_ratio = np.cumsum(variance / total_variance_in_x)

array([ 0.00010743,  0.00021485,  0.00032228,  0.0004297 ,  0.00053713])

The explained variance calculation using sparse PCA is based on the original paper cited above, page 273. So as you can see, this explained variance ratio is pretty small.

My questions are:

  1. Am I doing something wrong on the calculations of the explained variance?
  2. Is there any place where the mathematical formulation for the sparse pca implemented in python can be found?
  3. what does the normalize_components parameter in spca python implementation do? Why does it raise a deprecation warning when set to False?

EDIT (based on the answer)

Based on the answers, I test the results by setting normalize_components=True and avoiding this way the deprecation warning.

spca = SparsePCA(n_components=5, alpha=1e-3, ridge_alpha=1e-6, normalize_components=True) # Solve sparse pca
spca.fit(x)
t_spca = spca.transform(x)
p_spca = spca.components_.T
t_spca_qr = np.linalg.qr(t_spca) 
r = t_spca_qr[1]
# compute adjusted variance
variance = []
for i in range(5):
    variance.append(np.square(r[i][i]))
variance = np.array(variance)

This yields to the following results for the variance:

variance
array([ 4255042.12587337,   386089.3106474 ,    31883.68815345, 15333.57500443,     9781.36298748])

However, those numbers are much larger than the sample size n=505 or than the variance from the original matrix x total_variance_in_x=9308.784 so by dividing the entries of the variance array by 505 or 9308.784 I still do not get meaningful results regarding the explained variance.

Best Answer

Answer:

  1. Your code seems to be in line with the cited paper, and the incosistency is an artefact of non-standard definition of principal components used in scikit-learn of versions < 0.22 (this is mentioned in the deprecation message).

    You can get meaningful results by

    • setting normalize_components=True
    • dividing the entries of the variance array by the number of samples, 505.

    This gives you explained variance ratios like

    0.90514782, 0.98727812, 0.99406053, 0.99732234, 0.99940307.

  2. and 3. The most immediate way is to check the source files of the sklearn.decomposition on your computer.


Details:

  • The code of SparsePCA, as in scikit-learn=0.21.3, has an unexpected artefact: as is returns a transformation of inputs such that the $QR$ decomposition has $R$ diagonal with non-zero values in $\{-1, 1\}$ (rounding to 5 digits after the comma, easy to check) and therefore is not reflective of the preserved variance as it should be according to Zou, Hastie and Tibshirani.

  • normalize components is key to getting to the meaningful variance ratios. This argument is only present in scikit-learn=0.21.3 and absent in scikit-learn=0.22. Set to False it induces a normalization of the outputs that hides the information about the explained variance. One can see the details of this normalization in the source code:

class SparsePCA(BaseEstimator,TransformerMixin):

  ...

  def transform(self, X):

     ...

     U = ridge_regression(...)

     if not self.normalize_components:
         s = np.sqrt((U ** 2).sum(axis=0))
         s[s == 0] = 1
         U /= s

     return U

The deprecation warning reads that this normalization is not in line with the common definition of principal components and is removed in version 0.22.

If you are still using an old version like me, this normalization can be reversed as mentioned in the brief version of the answer.