Solved – How to reverse factor analysis (FA) and reconstruct original variables

factor analysisMATLABneurosciencepca

I saw this interesting topic: How to reverse PCA and reconstruct original variables from several principal components? and a nice answer with a very useful example of Iris data in Matlab. I would like to do the same using factor analysis instead of PCA. I tried to make it with 'factoran' of Matlab with the help of @ttnphns and @amoeba but I don't obtain a good correlation between my reconstructed data and the original ones.

input_data (*data are EMG measurement from 6 arm muscles in order to identify synergies)

PCA method:

X = input_data;
mu = mean(X);
[eigenvectors, scores] = pca(X);
nComp = 2;
Xpca = scores(:,1:nComp) * eigenvectors(:,1:nComp)';
Xpca = bsxfun(@plus, Xpca, mu);

I obtain good correlation between them.

FA method:

X = input_data;
mu = mean(X);
[LoadingsPM,specVarPM,rotationPM,stats, scores] = ...
                factoran(X,2,'rotate','promax');
Xfa = scores*LoadingsPM'; 
Xfa = bsxfun(@plus, Xfa, mu);

But in this case the correlations are bad. I don't know if I forget something? (I divided per 3 the FA reconstruction in order to see better the 3 curves).

@ttnphns note: word "reverse" in the title should be taken in the technical sense of computing variables as they are returned by the computed factors (their scores), – not in the theoretical sense (in which FA model is nothing but predicting variables by factors, so that there is no a "reverse" direction). In PCA, this prediction/direction indeed could be called "reverse" in a theoretical sense, too.

Best Answer

@amoeba and @ttnphns have solved my problem in the comments. I posted the solution if someone is interested.

@amoeba:

Turns out, factoran implicitly standardizes all input variables and hence conducts FA on the correlation matrix (it's written in Help: "factoran standardizes the observed data X to zero mean and unit variance"). I could not find any input option that would turn off this behaviour. Hence, to do the "reconstruction", you need to compute stds = std(X); in the beginning and then to do Xfa = bsxfun(@times, Xfa, stds); after you multiplied scores by loadings and before adding the mean."

So the FA method corrected is:

X = input_data;
[LoadingsPM,specVarPM,rotationPM,stats, scores] = ...
                factoran(X,2,'rotate','promax');
Xfa = scores*LoadingsPM'; 
Xfa = bsxfun(@times, Xfa, std(X));
Xfa = bsxfun(@plus, Xfa, mean(X)); `

To complete this post, I recommend you this nice explanation made by @ttnphns: What are the differences between Factor Analysis and Principal Component Analysis?

Related Solutions

PCA-Factor-Analysis – Differences Between Exploratory Factor Analysis, Confirmatory Factor Analysis, and Principal Component Analysis

I will just address question 2. I have some doubts about how well the author knows his subject if he really said it the way you have presented it. PCA is applied to the sample just like EFA and CFA. It simply takes a list of n possibly related factors looks at how the points (samples) scatter in n-dimension space and then gets the first principal component as the linear combination that explains more of the variability in the data than any other linear combination. Then the second looks at orthogonal directions to the first to find theone out of those that explains the most of the remaining variability and so on with the 3rd and 4th. So sometimes one can take just 1-3 components to describe most of the variation in the data. That is why factor analysis and principal componet analysis are described according to 1 and 2 in your statement.

Solved – How to express Principal Components in their original scale

I can answer at least the first part of your questions, which is easy. Principal component scores in the scale of the original variables are computed by using the decomposition of the covariance (and not correlation) matrix. The scores are computed by postmultiplying the data matrix (X) by the unscaled eigenvectors (E), or Scores = XE. Often the eigenvectors are called "loadings" but a better usage for that term is the correlation between the scores and the original data.

But, based on the 2nd part of your question, I think you are asking about location and not about scale (scaling shrinks or blows up stuff while translation rigidly moves stuff around to different locations). Your scores are on the same scale as the original data but they have simply been centered (rigidly translated so the multivariate mean is located at the origin). If you want the uncentered scores, don't center the X or the raw scores. Or if you want to re-center to any location, simply add the vector to each case.

This is R code to do this:

library(data.table)
X <- as.matrix(fread('pca.txt'))
S <- cov(X)
res <- eigen(S)

E <- res$vectors
head(E)

L <- res$values
# percent
L/sum(L)
# wow, essentially a singe vector

# these are the uncentered scores
Scores <- X%*%E
head(Scores)
apply(Scores, 2, mean)

# here are the scores you posted, which are centered at zero
head(scale(Scores, scale=FALSE))


# show that sum of variances in Scores = sum of variances in X
sum(diag(S))
sum(diag(cov(Scores)))
# so, these are on same scale

Best Answer

Related Solutions

PCA-Factor-Analysis – Differences Between Exploratory Factor Analysis, Confirmatory Factor Analysis, and Principal Component Analysis

Solved – How to express Principal Components in their original scale

Related Question