MATLAB: PCA function outputting scores different to that expected, am i missing something

pca

I have run into an issue with the pca function whereby the outputted PC1 scores are the negative to those expected. To confirm this I tried to recreate an example I found online (<http://setosa.io/ev/principal-component-analysis/)>. I have attached a MatLab file for ease of use. What I have done is:

 >> [coeff,score,~,~,~] = pca(Example');   %I use the transpose of data as "Variables1" (17x1) are the variables I want to analyse.* 
 >> scatter(score(:,1),score(:,2)); 
 >> text(score(:,1)+dx,score(:,2)+dy,Variables2)

*Pretty sure this is worded terribly (sorry)

Above is the the output that I am expecting to find, and below is the ouput that I am getting from the pca function. As you can see the PC2 values are the same but the PC1 values are negative of what expected (Fig above).

Why does this happen? (Not necessary but if you can word this part * better it would be much appreciated for when I have top explain it.)

Thanks in advance.

Best Answer

Hi Matteo,

I don't believe there is really a problem here. Let Vt denote the transpose of the data matrix Values, so that Vt is 4x17 like you want. With

[coef score latent] = pca(Vt)

pca computes** the eigenvalue decomposition of the covariance matrix of Vt. The resulting eigenvectors are the columns of coef. Then score is computed with

score = Vt0*coef

where Vt0 is shown in the code below. Each eigenvector is real and normalized to 1, but is still arbitrary to within an overall factor of +-1. Since all the columns of coef are orthogonal to each other, there is no foolproof way to assign those signs uniquely. It looks like the example you are using disagrees with Matlab on the overall sign of the first column of coef. So the score matrix comes up with different signs as well. Nothing wrong with that.

What matters is that you can still relate the scores to the data. No matter what the overall signs of the columns of coef are, after you calculate scores that way it should still be true that

Vt0 = score*coef'

You can change the overall signs of coef columns and make your own coef, as in the example below. The resulting scores agree with the example.

Forget about salad. After looking at the png file, this all makes me want to fly to Belfast and eat fish and chips.

load('Example.mat')
Vt = Values';
[coef score lat] = pca(Vt);
% create new coef matrix and a new score matrix
coefnew = coef;
coefnew(:,1) = -coefnew(:,1);
Vt0 = Values' - mean(Values');   % covariance matrix calculation does this anyway
scorenew = Vt0*coefnew;
figure(1);scatter(score(:,1), score(:,2))
figure(2);scatter(scorenew(:,1), scorenew(:,2))   % same as example
Vt0_check = score*coef'
Vt0_check_new = scorenew*coefnew'
max(max(abs(Vt0-Vt0_check)))
max(max(abs(Vt0-Vt0_check_new)))

** it accomplishes this more accurately using svd instead of eig but with the same intent

Related Solutions

MATLAB: How to Decipher a Biplots Observation

The "Observation" number simply corresponds to the row number of the original data point. That is, you will have 10,000 observations given your data set. Note that the component scores shown in the data tip may not match the scores returned by "pca" according to the "biplot" documentation:

biplot scales the scores so that they fit on the plot: It divides each score by the maximum absolute value of all scores, and multiplies by the maximum coefficient length of coefs. Then biplot changes the sign of score coordinates according to the sign convention for the coefs.

In R2015a, the exact transformation steps can be seen in lines 198 and 199 of "biplot.m":

      maxCoefLen = sqrt(max(sum(coefs.^2,2)));
      scores = bsxfun(@times, maxCoefLen.*(scores ./ max(abs(scores(:)))), colsign);

MATLAB: Score matrix is the principal components

You are basically correct. It is not just the first 3 vectors that are principal components, but all the vectors are the complete set of principal components. They reported in the decreasing order of variance explained (which is decreasing order of magnitude of eigenvalue).

Also, you might want to use the pca command rather than princomp. According to the documentation, they may remove princomp.

Best Answer

Related Solutions

MATLAB: How to Decipher a Biplots Observation

MATLAB: Score matrix is the principal components

Related Question