MATLAB: PCA in Matlab reduce dimensionality

pca

I just want to have a simple PCA to reduce my dimensionality of let say 400 * 5000 to 400 * 4

meaning reduce from 5000 to 4.

I am not sure where can i set the value of reduction.

coeff = pca(X)

I am trying to follow:

load hald

Then:

The dataset of ingredient is 13 * 4


 coeff = pca(ingredients)

Output:

 coeff = 4×4
   -0.0678   -0.6460    0.5673    0.5062
   -0.6785   -0.0200   -0.5440    0.4933
    0.0290    0.7553    0.4036    0.5156
    0.7309   -0.1085   -0.4684    0.4844

I am wondering can i change it to output of 13 *2

Best Answer

 [coeff, score] = pca(ingr);
requiredResult = score(:,1:2);

or if you want to change coeff to 13 x 2 matrix, you'll have to use reshape function, but to use reshape your variable coeff must have atleast 13 x 2 elements

or you can use repmat, it will repeat copies of the array coeff

Related Solutions

MATLAB: How to select the components that show the most variance in PCA

Here is some code I wrote to help myself understand the MATLAB syntax for PCA.

rng 'default'
M = 7; % Number of observations
N = 5; % Number of variables observed
% Made-up data
X = rand(M,N);
% De-mean (MATLAB will de-mean inside of PCA, but I want the de-meaned values later)
X = X - mean(X); % Use X = bsxfun(@minus,X,mean(X)) if you have an older version of MATLAB
% Do the PCA
[coeff,score,latent,~,explained] = pca(X);
% Calculate eigenvalues and eigenvectors of the covariance matrix
covarianceMatrix = cov(X);
[V,D] = eig(covarianceMatrix);
% "coeff" are the principal component vectors.
% These are the eigenvectors of the covariance matrix.
% Compare "coeff" and "V". Notice that they are the same,
% except for column ordering and an unimportant overall sign.
coeff
coeff = 5×5
   -0.5173    0.7366   -0.1131    0.4106    0.0919
    0.6256    0.1345    0.1202    0.6628   -0.3699
   -0.3033   -0.6208   -0.1037    0.6252    0.3479
    0.4829    0.1901   -0.5536   -0.0308    0.6506
    0.1262    0.1334    0.8097    0.0179    0.5571
V
V = 5×5
    0.0919    0.4106   -0.1131   -0.7366   -0.5173
   -0.3699    0.6628    0.1202   -0.1345    0.6256
    0.3479    0.6252   -0.1037    0.6208   -0.3033
    0.6506   -0.0308   -0.5536   -0.1901    0.4829
    0.5571    0.0179    0.8097   -0.1334    0.1262
% Multiply the original data by the principal component vectors to get the
% projections of the original data on the principal component vector space.
% % This is also the output "score". Compare ...
dataInPrincipalComponentSpace = X*coeff
dataInPrincipalComponentSpace = 7×5
   -0.5295    0.0362    0.5630    0.1053   -0.0428
    0.2116    0.6573   -0.1721   -0.0306   -0.1559
    0.6427   -0.0017    0.2739   -0.1635    0.2203
   -0.6273    0.0239   -0.3678   -0.0710    0.2214
    0.1332    0.0507   -0.0708    0.2772    0.0398
    0.3145   -0.4825   -0.2080    0.1496   -0.0842
   -0.1451   -0.2840   -0.0182   -0.2670   -0.1987
score
score = 7×5
   -0.5295    0.0362    0.5630    0.1053   -0.0428
    0.2116    0.6573   -0.1721   -0.0306   -0.1559
    0.6427   -0.0017    0.2739   -0.1635    0.2203
   -0.6273    0.0239   -0.3678   -0.0710    0.2214
    0.1332    0.0507   -0.0708    0.2772    0.0398
    0.3145   -0.4825   -0.2080    0.1496   -0.0842
   -0.1451   -0.2840   -0.0182   -0.2670   -0.1987
% The columns of X*coeff are orthogonal to each other.
% This is shown with ...
corrcoef(dataInPrincipalComponentSpace)
ans = 5×5
    1.0000   -0.0000    0.0000   -0.0000   -0.0000
   -0.0000    1.0000    0.0000   -0.0000    0.0000
    0.0000    0.0000    1.0000    0.0000    0.0000
   -0.0000   -0.0000    0.0000    1.0000   -0.0000
   -0.0000    0.0000    0.0000   -0.0000    1.0000
% The variances of these vectors are the eigenvalues of the covariance matrix,
% and are also the output "latent". Compare these three outputs
var(dataInPrincipalComponentSpace)'
ans = 5×1
    0.2116
    0.1250
    0.1009
    0.0357
    0.0286
latent
latent = 5×1
    0.2116
    0.1250
    0.1009
    0.0357
    0.0286
sort(diag(D),'descend')
ans = 5×1
    0.2116
    0.1250
    0.1009
    0.0357
    0.0286

The first figure on the wikipedia page for PCA is really helpful in understanding what is going on. There is variation along the original (x,y) axes. The superimposed arrows show the principal axes. The long arrow is the axis that has the most variation; the short arrow captures the rest of the variation.

Before thinking about dimension reduction, the first step is to redefine a coordinate system (x',y'), such that x' is along the first principal component, and y' along the second component (and so on, if there are more variables).

In my code above, those new variables are dataInPrincipalComponentSpace. As in the original data, each row is an observation, and each column is a dimension.

These data are just like your original data, except it is as if you measured them in a different coordinate system -- the principal axes.

Now you can think about dimension reduction. Take a look at the variable explained. It tells you how much of the variation is captured by each column of dataInPrincipalComponentSpace. Here is where you have to make a judgement call. How much of the total variation are you willing to ignore? One guideline is that if you plot explained, there will often be an "elbow" in the plot, where each additional variable explains very little additional variation. Keep only the components that add a lot more explanatory power, and ignore the rest.

In my code, notice that the first 3 components together explain 87% of the variation; suppose you decide that that's good enough. Then, for your later analysis, you would only keep those 3 dimensions -- the first three columns of dataInPrincipalComponentSpace. You will have 7 observations in 3 dimensions (variables) instead of 5.

I hope that helps!

MATLAB: PCA values

Eigenvectors are a little arbitrary. If A is a matrix and b is one of its eigenvectors, then so is b multiplied by any scalar. In particular, if

A*b = e*b

for some eigenvalue e, then

A*(-b) = e*(-b).

Best Answer

Related Solutions

MATLAB: How to select the components that show the most variance in PCA

MATLAB: PCA values

Related Question