PCA – Understanding the Usage of CATPCA in SPSS

pcaspss

I have some questions regarding the usage of CATPCA in SPSS. I hope the answers to these questions will be helpful for others too who are planning to work on it.

1) In my data (on this data I want to run a CATPCA), there are some variables that are in proportions. But CATPCA treats the values with less than 1 as missing. So, I was planning to recode the scale variables that are in proportions into 10 categories (i.e. 0.00-0.10, 0.10-0.20, …, 0.90-1.00). If I do this, then which optimal scaling level should I choose? Ordinal?

2) Other than the variables that are in proportions, the variables are categorical (both yes-no type and ordinal) with maximum 4 categories. So, if I split the proportion-type variables into 10 categories, will that be any problem? Or, do you think I should split them into 5 categories (i.e. 0.00-0.20, 0.20-0.40, …, 0.80-1.00)?

3) My yes-no type variables are coded as 0-1. To avoid treating 0 as missing, I am planning to recode them as 1-2. Now, for my study, 'yes' means good and 'no' means bad/not good. So, which optimal scaling level should I choose for them? Ordinal or Nominal?

4) What to do if the same quantification comes for different categories of an ordinal variable? Should we merge the categories or use different optimal scaling level?

5) I had actually 4 factors associated with the questionnaire. So, I guess I should use 4 dimensions in solution. But how can I visualize the grouping of the variables (loadings plot) with this 4 dimensions? I actually need to know which dimension belongs to which factor. How do I check this?

6) When do we use 'multiple nominal' as the optimal scaling level? Any example?

Best Answer

1) This is the level or type of transformation, not actually the assumed level of an input variable. Ordinal or Spline ordinal level of transformation means simply that the transformation will be monotonic (potentially nonlinear). Numeric level means that it will be linear. Etc (read Help and Case studies for more). See a good reply by Jeromy Anglim on implications of different levels for the problem of overfitting. Numeric level is more constraining and is equivalent to classic (linear) PCA performed on discretized data. Ordinal is less constraining but at the same time less parsimonious. Spline ordinal is probably the best tradeoff choice.

2) My advice would be "try both" and compare the amount of variance explained and Cronbach alphas.

3) For binary variables it makes no difference, because with two categories any monotonic transform is but linear.

4) This is usual result, it means that some adjacent categories are indistinguishable. You could merge those if you wish to operate with original variables rather than with the scaled (transformed) ones in the future.

5) You can request all 2D loading plots for 4 dimensions via syntax. Or plot it (the loading matrix entries) yourself.

6) Multiple nominal level of transformation is equivalent to recoding a categorical variable into a set of dummy 0-1 variables and use this set as one whole: each category creates its own point on a plot. If all your variables are multiple nominal then CATPCA becomes identical to Multiple Correspondense analysis, or HOMALS.

Related Question