Solved – What does the PCA().transform() method do

pcascikit learnself-study

I've been taught to think of the PCA as change of basis technique with a cleverly chosen basis. Let's say my initial data is a $m\times n$ matrix $X$ where $m$ is a number of features and $n$ is a number of measurements. I've computed covariance matrix $S$ and got eigenbasis $m\times m$ matrix $P$ (eigenvectors of $S$) which represents my new set of coordinates. I now want to transform my data to this new coordinates by $Y=PX$. Alternatively, I use sklearn.decomposition.PCA class to perform the same procedure, but the transformed data differs from what I get manually.

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA

# generate some random data
m = 10
n = 100
X = np.random.randn(m, n)
X = X - X.mean(axis=1).reshape((m, 1))
S = X @ X.T / (n-1)

# manual computation
P = np.linalg.eig(S)[1] # transformation matrix P
Y = P @ X # transformed data

# using sklearn
pca = PCA()
pca.fit(X.T)
Y_sklearn = pca.transform(X.T).T

The output for the first vector in $Y$, Y[:, 0], is

array([-0.09133876, -1.53859883,  0.86409512, -2.52404208,  0.05910835,
        0.83063718,  0.52757518,  0.7412817 , -0.42611878, -0.71241571])

while for Y_sklearn[:, 0] is

array([ 1.44259169,  1.05948004,  0.87768441,  0.60333571, -1.560406  ,
        0.11799914, -1.91440021, -0.96841104,  0.41010045, -0.38189462])

I am probably making a mistake at some point, but can't find where exactly. Thanks in advance.

Best Answer

Your P matrix contains the eigenvectors as columns, so you need to reconstruct with P.T @ X in order to project your data (i.e. dot product). Now, they'll be more similar; but still not the same because np.linalg.eig doesn't return eigenvalues sorted. You can achieve the same ordering as follows:

   E = np.linalg.eig(S)
   P = E[1][:,np.argsort(-E[0])]

Finally, since eigenvectors can be $v$ or $-v$ you'll have some sign differences between your projections. But, they'll be quite similar.

Related Solutions

PCA in Python – How to Perform PCA on Data with High Dimensionality

The easiest way to do standard PCA is to center the columns of your data matrix (assuming the columns correspond to different variables) by subtracting the column means, and then perform an SVD. The left singular vectors, multiplied by the corresponding singular value, correspond to the (estimated) principal components. The right singular vectors correspond to the (estimated) principal component directions — these are the same as the eigenvectors given by PCA. The singular values correspond to the standard deviations of the principal components (multiplied by a factor of root n, where n is the number of rows in your data matrix) — the same as the square root of the eigenvalues given by PCA.

If you want to do PCA on the correlation matrix, you will need to standardize the columns of your data matrix before applying the SVD. This amounts to subtracting the means (centering) and then dividing by the standard deviations (scaling).

This will be the most efficient approach if you want the full PCA. You can verify with some algebra that this gives you the same answer as doing the spectral decomposition of the sample covariance matrix.

There are also efficient methods for computing a partial SVD, when you only need a few of the PCs. Some of these are variants of the power iteration. The Lanczos algorithm is one example that is also related to partial least squares. If your matrix is huge, you may be better off with an approximate method. There are also statistical reasons for regularizing PCA when this is the case.

Solved – Using PCA for detecting similar regions in an image

It's to be expected that "copied" blocks are almost equal (and more so after the PCA manipulation), so in the lexicographical sort (warning: it's understood that this lexicographic order orders first the most principal component, and so on) "copied" blocks should appear adjacent or near (the reverse is not true: adjacent lexicographicly sorted elements are not necessarily copied, nor even similar)

Here I made up a very simple example myself, in Octave, with a unidimensional signal (y) of size N=200, which has a portion of it copied (here, from 20-50 to 150-180) and a little noise added. I take a small block size (b=3). I convert to PC, sort the rows in lexicographical order (I append first the original block position in an extra column), and compute the distance between adjacent rows (notice that I'm simplifiying a lot here: I'm not discarding components, nor quantizing them; and I'm considering only adjacent rows, not a neighborhood band). I then look at the histogram of those distance, and the original offset is cleary visible.

N=200;
b=3;
delay=130; 
y = filter([1],[1,-0.8,0.1],rand(1,N)-0.5); % my signal, rather arbitrary
y(20+delay:50+delay) = y(20:50);  % a portion is copied
y += (rand(1,N)-0.5)*0.1; % noise added
yy=[y(1:N-2);y(2:N-1);y(3:N)];  % octave does not have  corrmtx (this is not general in b!)
[PC, Z, W, TSQ] = princomp (yy'); % PCA
Z(:,b+1)=[1:N-2]'; % append original block position, in extra row
Z1=sortrows(Z);  % sort rows lexicographycally
Z2=abs(Z1(1:N-3,b+1)-Z1(2:N-2,b+1));  % compute temporal distances between adjacent rows
histo(Z2); % histogram: should show a peak at delay

Best Answer

Related Solutions

PCA in Python – How to Perform PCA on Data with High Dimensionality

Solved – Using PCA for detecting similar regions in an image

Related Question