My answer is: you cannot see the condition of indeterminacy of factor $F$ on the above 3D plot because you will need 4D space to see it.
Let us, for a moment, reduce the whole picture by one dimension by dropping one of two X variables, while leaving there the prerequisites of factor analysis. (Please don't take the action for re-doing FA on a single variable - it is impossible. It is just imaginary deletion of one of the variables in order to spare one dimension.) So, we have some variable $X$ (centered), in the subject space of, say, N=3
individuals. The values are the coordinates onto the individuals:
ID X
1 2
2 -2
3 0
As the things go in FA, we must decompose $X$ into $F$ the common factor, and $U$, the unique factor, both orthogonal, but neither of them coinciding or orthogonal with $X$. $F$ and $U$ will define a plane, call it "plane U". The angle between $X$ and $F$ is determined from the analysis and it gives loading $a$ - the coordinate of $X$ on $F$.
We soon discover that the solution is not unique relative axes-individuals 1,2,3. Look at the left picture. Here "plane U" (grey) is defined to coincide with horizontal plane defined by axes 1-2 (beige). It may look a bit reclinate, but it's an illusion - it is actually a bit rotated about axis 3 because angle FX is somewhat less than angle UX. Now look at the right picture. Here, clearly, "plane U" is rotated about vector $X$ to become almost perpendicular to the horizontal "plane 12". In both cases we did not alter the coordinates of $X$ onto $U$ and $F$, including loading $a$ - we only spinned arbitrarily the same plane about the straight line $X$. Thereby we changed coordinates of endpoints of $F$ and $U$ onto axes 1,2,3. The coordinates, which are the values of factor $F$ and unique factor $U$.
Thus, we've just observed the indeterminacy of factor values. Factor can be determined in FA up to its loadings and its variance only; a infinite number of solutions exist in regards to factor values, - true factor values will always remain under question.
We've shown the indeterminacy by spinning "plane U" around axis of $X$, that is, a 2D space was revolving about 1D space in 3D space. A plane can revolve about some straight line in it, in a space; a line can revolve about some point in it, in a plane; in general: a q-dimensional space can freely spin about its q-1 dimensional subspace in a q+1 dimensional superspace.
Having grasped that, let's return to the initial picture posted with the question. Bringing back the temporarily removed second X variable, we now have 2D "plane X" and consequently 3D "space U" (it consists of orthogonally intersecting planes U1 and U2). That latter may freely spin about "plane X". As it spins - without changing any of the loadings ($a$'s) or vector lengths (variances) - the endpoint of $F$ rushes within the parent space, the N
-dimensional space of subjects. But to be able to show it we need 3+1= 4D space (the "q+1 dimensional superspace"), which we cant'd draw in our world. So, we can't see the indeterminacy of factor values on that (geometrically correct) 3D picture, but it is there.
What about component values/scores and factor scores? Both are computed as linear combinations of variables, and so their vectors lie in "plane X". Component scores are true component values. Factor scores are approximations of unknown true factor values. Both component and factor scores can be fully determined in the analysis. If we apply once again to the pictures of this answer, showing the "reduced one-variable example", we'll find that the component or the factor scores variate should lie within 1D space X, the $X$ itself. So, no revolving can occur. The length of the component/variate vector is defined in the analysis, and its endpoint gets fixed in 3D space of individuals. No indeterminacy.
To conclude (staring again at the initial plot): What lies in the space X of the observed variables - is fixed, up to the case values. What transcends that space - namely the m-dimensional common factor space (m=1
in our situation) + the p-dimensional unique factors space, orthogonally intersecting with the former - is freely turnable, in a lump, about the space X in the grand space of N subjects. Therefore factor values are not fixed, while component values or estimated factor scores are.
This is a perfectly fine definition based on the resources they cite (eg. Jolliffe, 2002); it is at no point wrong. To your particular questions:
By score
they represent the projections $\Xi$ of the centred data in the linear space defined by the eigenvectors $\Phi$. You can immediately check this in your script with something like: all( abs(score) - abs(X_centered' * U) < 2*eps)
(I use the abs
to ensure we issues with sign).
You can produce the $k$-th dimensional approximation of your centred data by using the score of the $K$ first principal components of them. That is: $\hat{X}^K_{c} = \sum_{i=1}^K \xi_i \phi_i$. Assuming $K=5$ in your script this is plainly (coeff * score')
which numerically equates the centred sample: all(abs( X_centered - (coeff * score')) < 2*eps)
.
I believe that some of your misconception stems from the fact that you say: "score
should be the best linear combination of the principal components $U$", but unfortunately this is not the case. The score
dictate which it the best linear combination of the principal components $U$ to reconstruct the data in terms of fraction-of-variance-explained but they are not the results of that combination. In terms of PCA, SVD contains only the left singular vectors, $U$ (the eigenvectors of the covariance matrix of $X$) and the singular values S
(the square root of the eigenvalues of the covariance matrix of $X$, more information here); nothing about the scores $\Xi$. You will need to project the centred sample $X_c$ using $U$ to get the scores $\Xi$. Inversely, if you know use $\Xi \Phi^T$ you can reconstruct the data back.
To recap: score
are the projections of the centred data in the linear space defined by the eigenvectors of the covariance matrix of $X$. This exactly your final result: $\text{scores} = U^T (X - \bar{X})$.
A side-comment: When I started reading on PCA I first try to get the covariance derivation right and then moved to the SVD. I believe that the covariance methodology is a bit easier to follow and somewhat more intuitive in terms of Statistics as well as physical interpretation. Maybe you want to nail that down first and then move to the SVD methodology.
Best Answer
Let's start by looking at your equation.
As an example, consider a dataset with $4$ variables and $100$ data points, so that $X$ is of size $100\times 4$ (and centered). PCA constructs $4\times 4$ covariance matrix and finds its eigenvectors. Suppose we selected $2$ eigenvectors to perform the dimensionality reduction. Then $W$ is of size $4 \times 2$. Multiplying $X$ by $W^\top$ (note the transpose!), we get a $100\times 2$ matrix of scores: $$T=XW^\top,$$ or spelled out:
$$\underbrace{\left(\begin{array}{cc} |&|\\|&|\\t_1&t_2\\|&|\\|&|\end{array}\right)}_T=\underbrace{\left(\begin{array}{cc} |&|&|&|\\|&|&|&|\\x_1&x_2&x_3&x_4\\|&|&|&|\\|&|&|&|\end{array}\right)}_X\cdot {\underbrace{\left(\begin{array}{cc} |&|\\w_1&w_2\\|&|\end{array}\right)}_W}^\top.$$
Absolutely not! In my example, there are four $x_i$ variables, but only two $w$ vectors. There is no correspondence between a particular $x_k$ and $w_k$ at all.
No! Scores $T$ don't tell you anything about the importance of the original variables.
In fact, nothing in PCA tells you about the "importance" of the original variables.
PCA is sometimes used for feature selection, see here: Using principal component analysis (PCA) for feature selection -- this is based on the assumption that the variables contributing most to PC1 are most "important", i.e. it is the elements of $w_1$ that reflect the "importance" of the original variables. However, there is no guarantee that this assumption should always be reasonable.