Solved – Principal components analysis on nested data

multilevel-analysisnested datapcar

I'm working on a piece of analysis that requires identifying a small set of variables that summarize the variation found in a larger set of principal observations on teacher practice. Given the nature of the question, it seems natural to use PCA in my analysis. My data set however, has a hierarchical structure (observation occasions at the first level, and teachers at the second). So, I have the following questions:

  1. What do I risk by doing PCA without considering this temporal effect?
  2. What methods are there for accounting for the nested structure of data in PCA? Are there any useful tools in R?

Best Answer

An important question is to evaluate how consistent the principal axes are across individual teachers. If they are mostly consistent, you can just use PCA on the whole dataset (ignoring teacher segregation) and be happy. If it happens that different teachers have fundamentally different axes of variance, then you likely have to reconsider the entire approach.

So the question I will try to answer is how to test if PCA are consistent across individuals. This is a solution I used some time ago, I have no guarantees that it is in any way optimal.

Step 1: Decide on the relative amount of explained variance $\alpha$ (e.g. 90%) that your PCA should cover

Step 2: For one teacher $i$, find the first $n$ PCA's that explain $\alpha$ variance. This will give you the eigenvalues $e_i$ and eigenvectors $V_i$. Importantly, the eigenvectors form the mapping that projects original data $x_i$ onto PCA space $\psi_{ii}$

$$\psi_{ii} = V_i^T x_i$$

Step 3: What we want to do is to calculate for another teacher $j$, how much of its variance would be explained by using the PCA space of teacher $i$. The prediction for original data given its PCA is

$$\hat{x}_{ii} = V_i\psi_{ii} = V_iV_i^Tx_i$$

and the relative explained variance is then

$$l^2_{ii} = \frac{||\hat{x}_{ii} - x_i||}{||x_i||} $$

where $||\cdot||$ is a 2-norm (aka euclidean distance). The same principle can be used to project the data of teacher $j$ onto basis of $i$

$$\hat{x}_{ij} = V_i \psi_{ij} = V_i V_i^T x_j$$

and the relative explained variance

$$l^2_{ij} = \frac{||\hat{x}_{ij} - x_j||}{||x_j||} $$

Step 4: Finally, we can repeat this procedure to obtain the matrix $l^2_{ij}$ for every pair of teachers $ij$. This matrix can be used to judge the extent to which a common PCA would capture individual variance, and find out detailed information on which clusters of teachers use similar variance axis, and which are outliers.

Related Question