Solved – How to transform factor scores of a PCA for a regression, in SPSS

multiple regressionpcaregression

I am really a novice in statistics and really need some help! I find here, or in any books answer to my problem. If I missed it, sorry about that and could you share the link with me, please?

Here, is briefly described the background of my study:

I have collected a set of 20 independent variables (dummy and numerical), let's call them A, B, C…T, which are different potential reasons of getting sick. I have a population of 60 families, who during a year suffered of sickness 1, sickness 2 and sickness 3. I have three dependent variables (S1, S2, S3) telling the percentage of days that the different members of each family were sick during a year.

My 20 independent variables can be explain by less factors (diet, living environment, etc.). So, I have done a Principal Component Analysis (rotated factors, varimax, etc), which gave me 5 factors. Everything works perfectly so far, even if some variables loads in different factors at the same time, their strongest loading (.6 and higher) is clearly identify in one particular variable. So I have:

Factor 1 = B, D, E, F, K, N, O and S
Factor 2 = C, G and T
Factor 3 = H, J and P
Factor 4 = A, I and M
Factor 5 = L, Q and R

B, H, M, O, P, S and T load in more than one factor but not in a much less significant way. The total variances explained is 63%, I have good eigenvalues (respectively: 5.1, 2.1, 2.0, 1.9 and 1.6)

From this PCA, I saved the factor scores for regression. (In SPSS, I used during my Facto Analysis: Scores –» save as variables –» Regression) I understand that the regression factor scores in SPSS are standardized, with a mean = 0 and Std Deviation = 1. A score of 0 on a factor therefore means that this variable's ratings of the importance of the relevant attributes is close to the average for my sample. I cannot use them directly in my regression.

Here is where I need help…

I have been told to multiply the factor loading with my original variables, and then sum them up to obtain my new variable to use in the regression. So I did the fallowing thing in spss:

*[(factor scores 1) x B] + [(factor scores 1) x D] + [(factor scores 1) x E] + [(factor   scores 1) x F] + [(factor scores 1) x K] + [(factor scores 1) x N] + [(factor scores 1) x   O] + [(factor scores 1) x S] = N-Var1* (new variable 1)

*[(factor scores 2) x C] + [(factor scores 2) x G] + [(factor scores 2) x T] = N-Var2
[(factor scores 3) x H] + [(factor scores 3) x J] + [(factor scores 3) x P] = N-Var3
[(factor scores 4) x A] + [(factor scores 4) x I] + [(factor scores 4) x M] = N-Var4
[(factor scores 5) x L] + [(factor scores 5) x Q] + [(factor scores 5) x R] = N-Var5*

Then, I used the N-Var1, N-Var2, N-Var3, N-Var4 and N-Var5 as independent variables to explain the model for the degree of sickness of S1 in a first regression, then do different regression for S2 and S3, still using the same independent variables to explain the model (N-Var1,…). I want to measure the worst factor (= strongest influence on the dependent). The one that makes my population more easily sick with S1 (the worst) and encourage the development of S2 and S3 (I am expecting to see a strong influence on S1 from many factors). However, when my PCA is working perfectly (I'm sure about it), the results of my regression does not make sense at all (I'm also sure about it).

Does it mean I did something wrong when I transformed my factor scores? Or just that no model can work to explain S1, S2 and S3?

Best Answer

simply put:

$ factorscore = loading_1*X_1+loading_2*X_2+\ldots+loading_k*X_k $

You may need to standardize your variables beforehand if they do not share the same metric. If you do this with your data, your self-computed factor score should correlate above .9 with SPSS' factor score (at least if you do not have cross-loadings). Therefore I do not really see the point of computing them by hand. Moreover, if you have reasonable PCA results, you may want to compute something like a sum score instead of factor scores for your subsequent regression analyses.

Related Question