Solved – Problems with implementing PCA in Matlab

machine learningMATLABpca

I'm implementing PCA using eigenvalue decomposition in Matlab. I know Matlab has PCA implemented, but it helps me understand all the technicalities when I write code.
I've been following the guidance from here, but I'm getting different results in comparison to built-in function princomp.

Could anybody look at it and point me in the right direction.

Here's the code:

function [mu, Ev, Val ] = pca(data)

% mu - mean image
% Ev - matrix whose columns are the eigenvectors corresponding to the eigen
% values Val 
% Val - eigenvalues

if nargin ~= 1
 error ('usage: [mu,E,Values] = pca_q1(data)');
end

mu = mean(data)';

nimages = size(data,2);

for i = 1:nimages
 data(:,i) = data(:,i)-mu(i);
end

L = data'*data;
[Ev, Vals]  = eig(L);    
[Ev,Vals] = sort(Ev,Vals);

% computing eigenvector of the real covariance matrix
Ev = data * Ev;

Val = diag(Vals);
Vals = Vals / (nimages - 1);

% normalize Ev to unit length
proper = 0;
for i = 1:nimages
 Ev(:,i) = Ev(:,1)/norm(Ev(:,i));
 if Vals(i) < 0.00001
  Ev(:,i) = zeros(size(Ev,1),1);
 else
  proper = proper+1;
 end;
end;

Ev = Ev(:,1:nimages);

Best Answer

The line [Ev,Vals] = sort(Ev,Vals); probably does not do what you think it should. The sort command would have to be applied separately. Moreover, eig returns a diagonal matrix of eigenvalues, which you have to strip out to a vector. Probably you want to do the following:

[Ev, Vals]  = eig(L);
Vals = diag(Vals);        %strip out the eigenvalues;
[Vals,sidx] = sort(Vals); %do the sort
Ev = Ev(:,sidx);          %apply the sort to the eigenvectors as well.

I would guess this is actually unecessary, because eig returns the eigenvectors, -values, in ascending order anyway.

On a stylistic note, you can subtract out the mean using bsxfun as follows:

X = bsxfun(@minus,X,mu);

instead of calling the for-loop.

Related Solutions

Machine Learning – Conducting PCA Using Princomp in MATLAB

Answers to your questions:

I usually use eig(cov(data)) to get the sample eigenvectors but it is matter of personal taste. Probably princomp is better as it gives you the eigenvectors and the projections in one step. If you know $k$ and you want exactly $k$ components it could be easier to use eigs(cov(data),k). Saves you the hassle of computing the higher order components as it returns only the $k$ largest eigenvalues and eigenvectors of the covariance matrix.
Yes, you are mostly using PCA as a dimensionality reduction step. (See next point about accuracy)
Automatic determination of PCA dimensionality is really big subject. Check Minka's work. A nice and quite easily read survey is given by Cangelosi and Goriely here. In short there is no way to have the same data fidelity with your projected data as you would with the original feature vector as by definition you will exclude some modes of variation when ($k < D$). Given that some modes might be just noise though, your classification accuracy might be better; nevertheless in terms of dataset reconstruction you just need to decide of an amount (ie. percentage) of variation you want to retain, you can easily calculate that by the eigenvalue ratios $\frac{\lambda_i}{\Sigma_{i=1}^{k} \lambda_i}$.
I haven't done that before. In general your big set of samples is not a problem; the dimensionality of them is. You can calculate the eigenvectors of 100 million 10-dimensional sample in just seconds. The covariance of a 10 element sample of 10^8 readings/dimensionality each will take you a lot longer. There is work on Online/sequential estimation of PCA according to wikipedia though. Warmuth and Kuzmin's "Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension" seems to investigate just that. (Probably I should also read this paper now that I think of it)

For your final GPU-related question: While I don't work with GPUs extensively myself (and even more I have never used MATLAB for that), I know if you have multiple I/O procedure to the GPU that is a severe computational bottleneck. As such the speed-up you get by the massively parellel computational capabilities of a GPU is nullified by the I/O overhead if you have a small dataset. You'll almost surely get more insightful answers about that point in StackOverflow.

Solved – Interpreting Principal Component Analysis output

You appear to be assuming that the largest eigenvalue necessarily can be paired with the largest coefficient within the eigenvectors. That would be wrong.

The question clearly transcends software choice. Here is a fairly silly PCA on five measures of car size using Stata's auto dataset. I used a correlation matrix as starting point, the only sensible option given quite different units of measurement.

. pca headroom trunk weight length displacement

Principal components/correlation                  Number of obs    =        74
                                                  Number of comp.  =         5
                                                  Trace            =         5
Rotation: (unrotated = principal)                 Rho              =    1.0000

--------------------------------------------------------------------------
   Component |   Eigenvalue   Difference         Proportion   Cumulative
-------------+------------------------------------------------------------
       Comp1 |      3.76201        3.026             0.7524       0.7524
       Comp2 |      .736006      .427915             0.1472       0.8996
       Comp3 |      .308091      .155465             0.0616       0.9612
       Comp4 |      .152627      .111357             0.0305       0.9917
       Comp5 |     .0412693            .             0.0083       1.0000
--------------------------------------------------------------------------

Principal components (eigenvectors) 

----------------------------------------------------------------
    Variable |    Comp1     Comp2     Comp3     Comp4     Comp5 
-------------+--------------------------------------------------
    headroom |   0.3587    0.7640    0.5224   -0.1209    0.0130 
       trunk |   0.4334    0.3665   -0.7676    0.2914    0.0612 
      weight |   0.4842   -0.3329    0.0737   -0.2669    0.7603 
      length |   0.4863   -0.2372   -0.1050   -0.5745   -0.6051 
displacement |   0.4610   -0.3390    0.3484    0.7065   -0.2279 
----------------------------------------------------------------

(Output truncated.)

The first component picks up on the fact that as all variables are measures of size, they are well correlated. So to first approximation the coefficients are equal; that's to be expected when all the variables hang together. The remaining components in effect pick up the idiosyncratic contribution of each of the original variables. That is not inevitable, but it works out quite simply for this example. But, to your point, you can see that the largest coefficients, say those above 0.7 in absolute value, are associated with components 2 to 5. There is nothing to stop the largest coefficient being associated with the last component.

(UPDATE) The eigenvectors are informative, but it is also helpful to calculate the components themselves as new variables and then look at their correlations with the original variables. Here they are:

                 pc1      pc2      pc3      pc4      pc5
    headroom   0.6958   0.6554   0.2900  -0.0472   0.0026
       trunk   0.8405   0.3144  -0.4261   0.1138   0.0124
      weight   0.9392  -0.2856   0.0409  -0.1043   0.1545
      length   0.9432  -0.2035  -0.0583  -0.2245  -0.1229
displacement   0.8942  -0.2909   0.1934   0.2760  -0.0463

Here trunk is the variable most strongly correlated with pc3, but negatively. A story on why that happens would depend on looking at the data and the PCs. I don't care enough about the example to do that here, but it would be good practice.

Although I produced the example with a little prior thought, and it is suitable for your question, and it is based on real data, it is also salutary: interpreting the PCs may be no easier than interpreting something more direct such as scatter plots and correlations. However, not every PCA application depends on ability to interpret PCs as having substantive meaning, and much of the literature warns against doing that in any case. For some purposes, the whole point is a mechanistic reordering of the information in the data.

Best Answer

Related Solutions

Machine Learning – Conducting PCA Using Princomp in MATLAB

Solved – Interpreting Principal Component Analysis output

Related Question