MATLAB: Determining variables that contribute to principal components

princomp pca

Hi,
I am trying to do a PCA analysis on a (24×3333) matrix where 24 is the number of observations and 3333 is the number of variables. I am using:
[coeff,score,eigval] = princomp(zscore(aggregate));
23 PCs are needed to explain 95% of the variance in the data. My question is how do I know which variables are contributing to each component. I believe I need to make a variable spreadsheet naming all 3333 variables. However, it is not clear how I would be able to identify the variables contributing to each component.
I also am creating a variable: %percent variation explained (PVE): variation in the original variable explained by a principal component
Because ultimately I want to quantify how much a variable contributes to its respective principal component.
for i = 1:3333
pve(:,i) = 100*coeff(i,i)*sqrt(var(score(:,i)))/(var(aggregate(:,i)));
end
Any insight would be a big help. I've been trying to figure this out for for weeks with no luck.
Thanks,
Eric

Best Answer

The first paragraph in the doc description for princomp says "COEFF is a p-by-p matrix, each column containing coefficients for one principal component." For example, to project your data onto the 1st principal axis, do zscore(aggregate)*coeff(:,1). Why not measure the contribution of a variable to a component by the size of the respective coefficient? Especially since you have standardized your data by zscore.
Since you have 23 components, the columns in score past 23 are filled with zeros. If you need to get the principal component variance, take the 3rd output from princomp.
Related Question