Solved – Factor Analysis: Principal Components vs Maximum Likelihood

factor analysismaximum likelihoodpca

I am using "Applied Multivariate Statistical Analysis" 6th ed by Johnson and Wichern, and in Ch. 10 they detail that factor analysis models can be derived from either Principal Components or a Maximum Likelihood method. After an example, they show that the residual matrix for Maximum Likelihood is smaller than the Principal Components method; therefore it is better. But they never state why it works better, only that it does for most examples.

This might be a broad question but what are the advantages of using a Maximum Likelihood method and why does it work better than using Principal Components for a factor analysis?

Best Answer

The best treatment of this question that I have seen is a 1979 book chapter by Karl Joreskog, "Basic Ideas of Factor and Component Analysis." Sadly, I can't locate a pdf online--it is a classic for readability and succinctness.

Maximum Likelihood is just an estimation method. The real distinction is between principal components analysis (PCA) and common factor analysis (FA). PCA aims to turn p observed variables into p or fewer weighted composites, choosing each additional composite so as to explain the greatest share of variance not explained by the previous composites. Covariance is explained almost by coincidence--the focus is on the p variances.

By contrast, FA accounts for the covariance among a set of p observed variables using k < p common factors PLUS p unique factors (or "error terms"). The p unique factors fit the diagonal elements of the observed variable covariance matrix trivially, leaving zero residuals there. The common factors are chosen so as to best account for covariance among the observed variables, thus minimizing residuals for the off-diagonal elements of the covariance matrix (which are more numerous than the diagonal elements for p > 3). Combining these two features, it should not be surprising that, for the same number of common factors or principal components, the average squared residual, across the entire covariance matrix, will be smaller for a FA model than for a PCA solution--assuming that the covariance matrix is consistent with a common factor model.

It is the fundamental difference in technique, not the difference in estimation method, that is primarily responsible for this difference in performance. Maximum likelihood and generalized least squares estimation of a factor model, for example, are asymptotically equivalent, given assumptions.

Jöreskog, K. G. (1979). Basic ideas of factor and component analysis. Advances in factor analysis and structural equation models, 5-20. Abt Books.