Solved – Analyzing Ranked Data: Correlation and Factor Analysis

factor analysisrrankingspearman-rho

I had survey respondents rank a series of items in order of importance, from 1 to 7. So, if a respondent assigned a score of 1 to one variable, then none of the other variables could get a 1 as well. The dataset is replicated in test.data below. I have three questions:

  1. Should I interpret this as ranked data rather than interval data? I
    think so, but I am not sure.
  2. If they are ranked data, is it sensible to use either the Wilcoxon-Mann-Whitney U or Kruskal-Wallis test to test for differences in median (?) rankings depending on the levels of the predictor variable?
  3. If they are ranked data, could I construct a correlation matrix using Spearman's Rho?
  4. If that is possible, could I use a factor analysis on that correlation matrix to possibly reduce the dataset and measure some hypothesized underlying constructs?

I have tried to do these steps basicallly using the steps below, although I see now that there is a paper and an R package suggesting the possibility of performing factor analysis on ranked data.

R Package For Ranking Data
Yu, Lam and Lo 2005

Thank you for your suggestions.

library(psych)
#Create Data frame
test.data<-replicate(10, sample(seq(1,6,1), replace=F))

#transpose
test.data<-t(test.data)

#data frame
test.data<-data.frame(test.data)

#Provide names
names(test.data)<-c('item1', 'item2', 'item3', 'item4', 'item5', 'item6’)

#Some predictors
test.data$gender<-factor(sample(c('M', 'F'), replace=T,size=10))
    test.data$position<-factor(sample(c('Journ', 'Pol'), replace=T, size=10))

#Correlation Matrix
cor.matrix<-cor(test.data[,1:6], method=c('spearman'))

#Factor Analysis
plot(eigen(cor.matrix)$values, type='o')

fa(cor.matrix, method='pa', rotate='none', nobs=nrow(test.data))

#Kruskal-Wallis or Mann-Whitney depending on predictor levels
lapply(test.data[,1:6], function(x) kruskal.test(x~position, data=test.data))

Best Answer

  1. Yes. However, not very seldom rankings are treated as interval data and are analyzed by parametric procedures (such as ANOVA). For example, this is customarily done in classic conjoint analysis. Conceptually, psychometrically, rankings are ordinal while ratings are potentially interval (albeit often cautiously treated as ordinal as well). Still, statistically both are just numbers on a quantititative scale, therefore unless specifically statistical assumptions are strongly violated they could be processed in the same manner. Especially by a univariate analysis. Note that if somebody considers each of the 6 variables that you generated in your example separately he will not be able to say whether they are “rankings” or “ratings”.
  2. Yes. However, those tests are sensible with interval data either.
  3. Yes. However, Spearman is sensible with interval data either, to capture nonlinear monotonic relationship.
  4. No. Classic (linear) factor analysis is only for Pearson correlation (and similar SSCP-type measures, read also). Spearman is based on nonlinearly transformed values, ranks (not your original "ranks", rankings - but the ones the procedure internally produces). Linear FA will erroneously "think" that those transformed values are the original values or are linearly transformed original values and will "uncover" linear underlying constructs which actually are not linear or even do not exist in your actual data. So use Pearson - if you dare to see the rakings data as ratings data. Alternatively, there exist special factor analytic procedures for categorical data, such as factor analysis on polychoric correlations, IRT factor analysis, PCA with optimal scaling (CATPCA).

Whatever factor analysis or other multivariate analysis you do on the rankings data you should be aware that the ordered multinomial (no ties) nature of ranking task induces negative correlations in the the data. In your code, for example, you generate 6 variables which are random ranking from 1 to 6. Expected correlations between the variables will be all -1/(6-1) = -0.2. (See also, about compsitional data in general>) If these were ratings, the expected correlations would be all 0. I’m speaking of the baseline or background correlations – expected in the absence of substantive factors of interest. The presence of a background general factor does not preclude doing factor analysis, generally, but it may be a complex issue – how to deal with it the best way.