Solved – PCA using princomp in MATLAB

machine learningMATLABpca

I'm trying to do dimensionality reduction using matlab princomp, but i'm not sure i'm do it right.

here is the my code just for test, but I'm not sure that I'm doing projection right:

A= rand(4,3)
AMean = mean(A)
[n m] = size(A)
Ac= (A - repmat(AMean,[n 1]))
pc= princomp(A)
k=2; %number of first principal components
A_pca= Ac * pc(1:k,:)'  %not sure i'm doing projection right
reconstructedA = A_pca * pc(1:k,:) 
error= reconstructedA- Ac

and my code for face recognition using ORL dataset:

%load orl_data 400x768 double matrix (400 images 768 features)
%make labels
orl_label=[];
for i=1:40
    orl_label= [orl_label;ones(10,1)*i];
end

n=size(orl_data,1);
k = randperm(n);
s= round(0.25*n);% take 25% for train

%raw pix
%split on test and train sets
data_tr = orl_data(k(1:s),:);
label_tr = orl_label(k(1:s),:);
data_te = orl_data(k(s+1:end),:);
label_te = orl_label(k(s+1:end),:);

tic
[nn_ind, estimated_label] = EuclDistClassifier(data_tr,label_tr,data_te);
toc

rate = sum(estimated_label == label_te)/size(label_te,1)

%using pca
tic
pc= princomp(data_tr);
toc

mean_face = mean(data_tr);
pc_n=100;
f_pc= pc(1:pc_n,:)';
data_pca_tr = (data_tr - repmat(mean_face, [s,1])) * f_pc;
data_pca_te = (data_te - repmat(mean_face, [n-s,1])) * f_pc;

tic
[nn_ind, estimated_label] = EuclDistClassifier(data_pca_tr,label_tr,data_pca_te);
toc

rate = sum(estimated_label == label_te)/size(label_te,1)

If I choose enough principal components it give me equal recognition rates, if small number then rate using pca is more poor.

Here is some questions:

Is princomp function the best way to calculate first k principal components using matlab ?
Using PCA projected features vs raw features don't give extra accuracy, but only smaller features vector size?(faster to compare feature vectors).
How to automatically choose min k (number of principal components) that give the same accuracy vs raw feature vector?
What if I have very big set of samples can I use only subset of them with comparable accuracy? or can I compute PCA on some set and later "add" some other set (I don't want to recompute pca for set1+set2, but somehow iteratively add information from set2 to existing PCA from set1)

I also tried GPU version simply using gpuArray:

%test using GPU
tic
A_cpu= rand(30000,32*24);
A= gpuArray(A_cpu);
AMean = mean(A);
[n m] = size(A)
pc= princomp(A);
k=100;
A_pca= (A - repmat(AMean,[n 1])) * pc(1:k,:)';
A_pca_cpu = gather(A_pca);
toc
clear;

tic
A= rand(30000,32*24);
AMean = mean(A);
[n m] = size(A)
pc= princomp(A);
k=100;
A_pca= (A - repmat(AMean,[n 1])) * pc(1:k,:)';
toc
clear;

Working faster, but it's not suitable for big matrices, maybe I'm wrong?

If I use big matrix it gives me

Error using gpuArray Out of memory on device.

Best Answer

Answers to your questions:

I usually use eig(cov(data)) to get the sample eigenvectors but it is matter of personal taste. Probably princomp is better as it gives you the eigenvectors and the projections in one step. If you know $k$ and you want exactly $k$ components it could be easier to use eigs(cov(data),k). Saves you the hassle of computing the higher order components as it returns only the $k$ largest eigenvalues and eigenvectors of the covariance matrix.
Yes, you are mostly using PCA as a dimensionality reduction step. (See next point about accuracy)
Automatic determination of PCA dimensionality is really big subject. Check Minka's work. A nice and quite easily read survey is given by Cangelosi and Goriely here. In short there is no way to have the same data fidelity with your projected data as you would with the original feature vector as by definition you will exclude some modes of variation when ($k < D$). Given that some modes might be just noise though, your classification accuracy might be better; nevertheless in terms of dataset reconstruction you just need to decide of an amount (ie. percentage) of variation you want to retain, you can easily calculate that by the eigenvalue ratios $\frac{\lambda_i}{\Sigma_{i=1}^{k} \lambda_i}$.
I haven't done that before. In general your big set of samples is not a problem; the dimensionality of them is. You can calculate the eigenvectors of 100 million 10-dimensional sample in just seconds. The covariance of a 10 element sample of 10^8 readings/dimensionality each will take you a lot longer. There is work on Online/sequential estimation of PCA according to wikipedia though. Warmuth and Kuzmin's "Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension" seems to investigate just that. (Probably I should also read this paper now that I think of it)

For your final GPU-related question: While I don't work with GPUs extensively myself (and even more I have never used MATLAB for that), I know if you have multiple I/O procedure to the GPU that is a severe computational bottleneck. As such the speed-up you get by the massively parellel computational capabilities of a GPU is nullified by the I/O overhead if you have a small dataset. You'll almost surely get more insightful answers about that point in StackOverflow.

Related Solutions

Solved – Using PCA for detecting similar regions in an image

It's to be expected that "copied" blocks are almost equal (and more so after the PCA manipulation), so in the lexicographical sort (warning: it's understood that this lexicographic order orders first the most principal component, and so on) "copied" blocks should appear adjacent or near (the reverse is not true: adjacent lexicographicly sorted elements are not necessarily copied, nor even similar)

Here I made up a very simple example myself, in Octave, with a unidimensional signal (y) of size N=200, which has a portion of it copied (here, from 20-50 to 150-180) and a little noise added. I take a small block size (b=3). I convert to PC, sort the rows in lexicographical order (I append first the original block position in an extra column), and compute the distance between adjacent rows (notice that I'm simplifiying a lot here: I'm not discarding components, nor quantizing them; and I'm considering only adjacent rows, not a neighborhood band). I then look at the histogram of those distance, and the original offset is cleary visible.

N=200;
b=3;
delay=130; 
y = filter([1],[1,-0.8,0.1],rand(1,N)-0.5); % my signal, rather arbitrary
y(20+delay:50+delay) = y(20:50);  % a portion is copied
y += (rand(1,N)-0.5)*0.1; % noise added
yy=[y(1:N-2);y(2:N-1);y(3:N)];  % octave does not have  corrmtx (this is not general in b!)
[PC, Z, W, TSQ] = princomp (yy'); % PCA
Z(:,b+1)=[1:N-2]'; % append original block position, in extra row
Z1=sortrows(Z);  % sort rows lexicographycally
Z2=abs(Z1(1:N-3,b+1)-Z1(2:N-2,b+1));  % compute temporal distances between adjacent rows
histo(Z2); % histogram: should show a peak at delay

Solved – Kernel PCA and classification

PCA and its kernelized version are data transformations, typically employed for dimensionality reduction or denoising. They are not used for classification directly, though you could use (standard) PCA as a preprocessing step before training a classifier.

The most common kernel-based classification methods are support vector machines (extremely popular), kernel LDA (less popular) and kernel logistic regression, so you probably want to read up on those.

Best Answer

Related Solutions

Solved – Using PCA for detecting similar regions in an image

Solved – Kernel PCA and classification

Related Question