Factor Analysis – Insight into Indeterminacy of Factor Values on Plot

factor analysisgeometrypca

It is a well-known fact that in principal component analysis (PCA) we can obtain true values of components but in factor analysis (FA) we cannot obtain true values of common factors. We can compute factor scores – for example, by regression method – but these are only reasonable surrogates of the ever-unknown factor values. The same regressional method in PCA gives component scores which are exactly the true component values. Methods to compute component and factor scores are described here.

From mathematical point of view, the indeterminacy of factor values (= the inexactness of factor scores) is quite clear. The reason for it is that we don't know values of unique factors (aka factor noise) at case level. Common factor model (in toto – the system of p variables' equations) differs from the PCA model is that it is rather a data-expansion than data-reduction technique. There are assumed m common + p unique latent factors: m+p > p (whereas in PCA just m latent axes + observed "residual" or "discrepancy"). In the FA's overparameterized situation it is impossible to compute unequivocally values of m common factors out of values of p variables due to the insufficient information. We can, however, derive unequivocal statistics for the factors: covariances (correlations) between the factors and the variables: the loadings.

OK. But why do we not see the factor indeterminacy on a picture which otherwise explains FA geometrically correctly? Below is a "vector representation in subject space", copied from here. It explains the gist of FA (the explanation is here). Subject space is briefly explained in the beginning of this answer. Subject space is simply "inside-down" scatter plot with variables as points and subjects as axes, then all the many redundant dimensions are concealed.

enter image description here

Just briefly what's going on. 1st principal component (thin red vector) lies in the space spanned by the variables (two blue vectors), white "plane X". Factor (fat red vector) overruns that space. Factor's orthogonal projection on the plane (thin grey vector) is the regressionally estimated factor scores. By the definition of linear regression, these factor scores are the best, in terms of least squares, approximation of factor available by the variables.

Now back to our question. Configuration is fixed in the space of (hidden) N axes-individuals; indeed:

  • Coordinates of $X_1$ and $X_2$ endpoints would be the N values of the two (centered) variables.
  • Coordinates of the component endpoint would be the N values of the component.
  • Coordinates of the factor scores endpoint would be the N scores themselves.

Likewise, coordinates of the factor $F$ endpoint should be the N true factor values. Why then would one say factor values are indeterminate? Where's your indeterminacy on the well-determined plot?

Best Answer

My answer is: you cannot see the condition of indeterminacy of factor $F$ on the above 3D plot because you will need 4D space to see it.

Let us, for a moment, reduce the whole picture by one dimension by dropping one of two X variables, while leaving there the prerequisites of factor analysis. (Please don't take the action for re-doing FA on a single variable - it is impossible. It is just imaginary deletion of one of the variables in order to spare one dimension.) So, we have some variable $X$ (centered), in the subject space of, say, N=3 individuals. The values are the coordinates onto the individuals:

ID X
1  2
2 -2
3  0

As the things go in FA, we must decompose $X$ into $F$ the common factor, and $U$, the unique factor, both orthogonal, but neither of them coinciding or orthogonal with $X$. $F$ and $U$ will define a plane, call it "plane U". The angle between $X$ and $F$ is determined from the analysis and it gives loading $a$ - the coordinate of $X$ on $F$.

enter image description here

We soon discover that the solution is not unique relative axes-individuals 1,2,3. Look at the left picture. Here "plane U" (grey) is defined to coincide with horizontal plane defined by axes 1-2 (beige). It may look a bit reclinate, but it's an illusion - it is actually a bit rotated about axis 3 because angle FX is somewhat less than angle UX. Now look at the right picture. Here, clearly, "plane U" is rotated about vector $X$ to become almost perpendicular to the horizontal "plane 12". In both cases we did not alter the coordinates of $X$ onto $U$ and $F$, including loading $a$ - we only spinned arbitrarily the same plane about the straight line $X$. Thereby we changed coordinates of endpoints of $F$ and $U$ onto axes 1,2,3. The coordinates, which are the values of factor $F$ and unique factor $U$.

Thus, we've just observed the indeterminacy of factor values. Factor can be determined in FA up to its loadings and its variance only; a infinite number of solutions exist in regards to factor values, - true factor values will always remain under question.

We've shown the indeterminacy by spinning "plane U" around axis of $X$, that is, a 2D space was revolving about 1D space in 3D space. A plane can revolve about some straight line in it, in a space; a line can revolve about some point in it, in a plane; in general: a q-dimensional space can freely spin about its q-1 dimensional subspace in a q+1 dimensional superspace.

Having grasped that, let's return to the initial picture posted with the question. Bringing back the temporarily removed second X variable, we now have 2D "plane X" and consequently 3D "space U" (it consists of orthogonally intersecting planes U1 and U2). That latter may freely spin about "plane X". As it spins - without changing any of the loadings ($a$'s) or vector lengths (variances) - the endpoint of $F$ rushes within the parent space, the N-dimensional space of subjects. But to be able to show it we need 3+1= 4D space (the "q+1 dimensional superspace"), which we cant'd draw in our world. So, we can't see the indeterminacy of factor values on that (geometrically correct) 3D picture, but it is there.

What about component values/scores and factor scores? Both are computed as linear combinations of variables, and so their vectors lie in "plane X". Component scores are true component values. Factor scores are approximations of unknown true factor values. Both component and factor scores can be fully determined in the analysis. If we apply once again to the pictures of this answer, showing the "reduced one-variable example", we'll find that the component or the factor scores variate should lie within 1D space X, the $X$ itself. So, no revolving can occur. The length of the component/variate vector is defined in the analysis, and its endpoint gets fixed in 3D space of individuals. No indeterminacy.

To conclude (staring again at the initial plot): What lies in the space X of the observed variables - is fixed, up to the case values. What transcends that space - namely the m-dimensional common factor space (m=1 in our situation) + the p-dimensional unique factors space, orthogonally intersecting with the former - is freely turnable, in a lump, about the space X in the grand space of N subjects. Therefore factor values are not fixed, while component values or estimated factor scores are.