Solved – do a PCA on repeated measures for data reduction

dimensionality reductionpcarepeated measures

I have 3 trials each on 87 animals in each of 2 contexts (some missing data; no missing data = 64 animals). Within a context, I have many specific measures (time to enter, number of times returning to shelter, etc), so I want to develop 2 to 3 composite behavior scores that describe the behavior in that context (call them C1, C2, C3). I want a C1 that means the same thing over all 3 trials and 87 animals, so that I can do a regression to examine effect of age, sex, pedigree, and individual animal on the behavior. Then I want to examine how C1 relates to the behavior scores in the other context, within the particular age. (At age 1, does activity in context 1 strongly predict activity in context 2?)

If this was not repeated measures, a PCA would work well – do a PCA on the multiple measures of a context, then use PC1, PC2, etc. to examine relationships (Spearman correlations) between PC1 in one context and PC1 (or 2 or 3) in the other context. The problem is the repeated measures, which falls into pseudoreplication. I've had a reviewer categorically say no-go, but I can't find any clear references as to whether this is problematic when doing data reduction.

My reasoning goes like this: repeated measures is not a problem, because what I am doing in the PCA is purely descriptive vis-à-vis the original measures. If I declared by fiat that I was using time to enter the arena as my "boldness" measure in context 1, I would have a context 1 boldness measure that was comparable across all individuals at all ages and no one would bat an eye. If I declare by fiat that I will use $0.5\cdot$ time-to-enter $+\ 0.5\cdot$ time-to-far-end, the same goes. So if I am using PCA purely for reductive purposes, why can't it be PC1 (that might be $0.28\cdot$ enter $+\ 0.63\cdot$ finish $+\ 0.02\cdot$ total time…), which is at least informed by my multiple measures instead of my guessing that time to enter is a generally informative and representative trait?

(Note I am not interested in the underlying structure of measures… my questions are on what we interpret the context-specific behaviors to be. "If I used context 1 and concluded that Harry is active compared to other animals, do I see Harry active in context 2? If he changes what we interpret as activity in context 1 as he gets older, does he also change his context 2 activity?)

I have looked at PARAFAC, and I have looked at SEM, and I am not convinced either of these approaches is better or more appropriate for my sample size.
Can anybody weigh in?
Thanks.

Best Answer

You could look into Multiple Factor Analysis. This can be implemented in R with FactoMineR.

UPDATE:

To elaborate, Leann was proposing – however long ago – to conduct a PCA on a dataset with repeated measures. If I understand the structure of her dataset correctly, for a given 'context' she had an animal x 'specific measure' (time to enter, number of times returning to shelter, etc) matrix. Each of the 64 animals (those without missing obs.) were followed three times. Let's say she had 10 'specific measures', so she would then have three 64×10 matrices on the animals' behaviour (we can call the matrices X1, X2, X3). To run a PCA on the three matrices simultaneously, she would have to 'row bind' the three matrices (e.g. PCA(rbind(X1,X2,X3))). But this ignores the fact that the first and 64th observation are on the same animal. To circumvent this problem, she can 'column bind' the three matrices and run them through a Multiple Factor Analysis. MFA is a useful way of analyzing multiple sets of variables measured on the same individuals or objects at different points in time. She'll be able to extract the principle components from the MFA in the same way as in a PCA but will have a single coordinate for each animal. The animal objects will now have been placed in a multivariate space of compromise delimited by her three observations.

She would be able to execute the analysis using the FactoMineR package in R. Example code would look something like:

df=data.frame(X1, X2, X3)
mfa1=MFA(df, group=c(10, 10, 10), type=c("s", "s", "s"), 
 name.group=c("Observation 1", "Observation 2", "Observation 3")) 
 #presuming the data is quantitative and needs to be scaled to unit variance

Also, instead of extracting the first three components from the MFA and putting them through multiple regression, she might think about projecting her explanatory variables directly onto the MFA as 'supplemental tables' (see ?FactoMineR). Another approach would be to calculate a Euclidean distance matrix of the object coordinates from the MFA (e.g. dist1=vegdist(mfa1$ind$coord, "euc")) and put it through an RDA with dist1 as a function of the animal specific variables (e.g. rda(dist1~age+sex+pedigree) using the vegan package).

Related Question