Chi-Squared-Test – How to Perform Chi-Square Test in SPSS Exploratory Factor Analysis?

chi-squared-testfactor analysismaximum likelihood

I ran an Exploratory Factor Analysis in SPSS recently with ML as the extraction method, and got the following table in my output:

Goodness-of-Fit test in SPSS EFA

I was not used to seeing goodness-of-fit tests in the context of EFA (as opposed to CFA), and wondered what the point of it was. The SPSS documentation seems to suggest that it's a way of deciding how many factors to select (number of factors in factor analysis problem).

If you choose maximum likelihood (ML) or generalized least squares
(GLS) as your extraction method, you would get a chi-square measure of
goodness of fit, which is a test of the null hypothesis that 3 factors
were adequate to explain the covariances among your variables. You
would not get a test of whether the factor loading matrix conformed to
your model.

I also found the formula used, which is as follows

Formula for ML and GLS goodness-of-fit tests in SPSS EFA

Is it true that what they 'want' me to do is to run this test with increasing numbers of factors selected for extraction, and then to select the number of factors when the test is no longer statistically significant?

If so, is such a method any good? I've heard of all sorts of other ways of deciding the number of factors (scree plots, Kaiser-Guttman rule, MAP test, parallel test) but had never heard of this one before and it seems very problematic.

The Chi Square test is very sensitive to sample size. Is it legitimate/useful to convert this particular Chi Square test to RMSEA?

Why does is the test only available with ML and GLS and not with other methods also offered, e.g. ULS?

Best Answer

This chi-square goodness-of-fit test which SPSS outputs under Maximum likelihood or Generalized least squares methods of factor extraction is one of the many methods to estimate the "best" number of factors to extract from the data. The test assumes that the data comes from multivariate normal population.

This chi-square tests the null hypothesis that the observed data correlation matrix p x p $\bf R$ is a random sample realization from population having correlation matrix equal to the one returned by the extracted m factors, i.e. to $\bf \hat{R}= AA'+U^2$ (where $\bf A$ are extracted loadings and $\bf U^2$ are then uniquenesses). That is, that $\bf R-\hat{R}$ residuals are random noise, sliding to $0$ as the sample size $n$ grows to infinity. That roughly means all positive eigenvalues of $\bf R-U^2$ except first $m$ ones are close to zero if the $m$-factor model fits.

Under sufficiently large $n$ the test statistic has approximately chi-square distribution with df $[(p-m)^2-(p+m)]/2$, and you can obtain p-value ($m$ thus must be small enough to give positive df according to the formula). If the test is significant that means $m$ factors is not enough and you should try at least $m+1$ extraction, and test again. Note this is not a test of factor by factor to tell you if the i-th factor is "significant" while the i+1-th is "not significant", it is the test of all the $m$-factor model fit, like in CFA (but CFA has more options to do the testing, such as, for example, to freeze some loadings as fixed parameters).

The test statistic is dependent on $n$ so the test is sensitive to the sample size (as often in statistics, no wonder): for large $n$, the test becomes impractically sensitive to small departures from the true model, so it can suggest you to raise $m$ while it is not warranted from all other criterion perspectives (including interpretability of factors).

Besides, departure from normality in the sample also can sharpen p-value, thus falsely suggesting an extra factor to extract.

The test could be, theoretically, computed and applied independently of the factor extraction method (still under normality assumption). However, it is logically more apt with Maximum likelihood method, first, because the test is ML in its nature, second, because ML extraction also requires normality, and third - because it is most easy to compute in ML as a by-product of this extraction algorithm. As for GLS extraction, it is very like ML extraction algorithmically, so why not output it here either.

The test is only one among many competitive ways to estimate the best number of factors.

Related Question