Solved – Next steps after performing a principal component analysis

factor analysispca

A few months ago, I developed a questionnaire using a principal component analysis (PCA) and tested the questionnaire for split-half reliability (using a sample which I will call sample #1).

I am in the process of writing a manuscript to submit for publication, which utilizes the questionnaire and its relationship to depression, anxiety, and stress. I have never developed a questionnaire before, and I don't know what further analyses to conduct with the new sample (sample #2). It should be noted that sample #2 is different from sample #1 but not terribly so (both samples are a composite of undergraduate students and individuals who attend exercise classes).

I already did the split-half reliability for the questionnaire with sample #2 and it was above .8, but is there anything else that should be done? Do I need to somehow confirm the factor structure of the questionnaire?

I did a PCA with sample #2 just to see what would happen, but the items are not lining up with the factors that were specified when I first did the PCA. Frankly, I want to know if I made a mistake in analyzing the questionnaire data for sample #2 with a PCA.

Should I have done the second principal components analysis, or is another analysis "better"? Or, should I just stick with the split-half reliability?

Best Answer

Principal component analysis just presents the orthogonal linear combinations of your variables that explain the most variability in your data. If you apply it to two mutlivariate data sets even if they are similar they will give different answers just due to random variability unless the sample size is very large and they really do both come from nearly identical multivariate populations. Also when I say very large that depends somewhat on the dimensionality. The large the number of variables the more samples you need. Par of the problem is the curse of dimensionality which state that the higher the number of dimnsions the more the data will spread out away from the center.

Principal component analysis could be used as a tool in regression analysis, clustering or classification problems because it is basically a dimension reduction technique as it often shows that most of the variability in the data can be explained by the first few principal components. So a good characterization of the data can be seen in lower dimensional spaces. Projection of the data on the first two principal components can be very useful in identifying characteristics in the data. It is basically an exploratory data analysis tool.

I am not familiar with split-half reliability and so I don't know how you are trying to use principal components there. What you should do next really depends on what your objective is? Maybe you should look at scatter plots of both samples on the first 2 prinicpal components found from the first sample. If they spread out differently that at least tells you something about how the samples differ and perhaps why the second sample had different principal components. If they look very similar in the scatter plot it might confirm that the two sample do come from the same poulation and that randomness in the data is all that is causing some shifting of the prinicipal components. Remember that each principal component is obtained by maximizing the variance in that data set in a direction orthogonal to the principal components that came before it and the first principal component is obtained by searching for the direction in the k dimensional space that explains the highest percentage of variation in the data set.

Given that all this optimization is going on relative to a particular data set it should not be surprising that the two data sets produce different principal components. In some situations the solution may be to combine the two samples and get the principal compoents for the combined data set to use in the future analysis. In other cases the differences can be so great that you would want to rethink whether or not the data really are similar.