I had survey respondents rank a series of items in order of importance, from 1 to 7. So, if a respondent assigned a score of 1 to one variable, then none of the other variables could get a 1 as well. The dataset is replicated in test.data below. I have three questions:
- Should I interpret this as ranked data rather than interval data? I
think so, but I am not sure. - If they are ranked data, is it sensible to use either the Wilcoxon-Mann-Whitney U or Kruskal-Wallis test to test for differences in median (?) rankings depending on the levels of the predictor variable?
- If they are ranked data, could I construct a correlation matrix using Spearman's Rho?
- If that is possible, could I use a factor analysis on that correlation matrix to possibly reduce the dataset and measure some hypothesized underlying constructs?
I have tried to do these steps basicallly using the steps below, although I see now that there is a paper and an R package suggesting the possibility of performing factor analysis on ranked data.
R Package For Ranking Data
Yu, Lam and Lo 2005
Thank you for your suggestions.
library(psych)
#Create Data frame
test.data<-replicate(10, sample(seq(1,6,1), replace=F))
#transpose
test.data<-t(test.data)
#data frame
test.data<-data.frame(test.data)
#Provide names
names(test.data)<-c('item1', 'item2', 'item3', 'item4', 'item5', 'item6’)
#Some predictors
test.data$gender<-factor(sample(c('M', 'F'), replace=T,size=10))
test.data$position<-factor(sample(c('Journ', 'Pol'), replace=T, size=10))
#Correlation Matrix
cor.matrix<-cor(test.data[,1:6], method=c('spearman'))
#Factor Analysis
plot(eigen(cor.matrix)$values, type='o')
fa(cor.matrix, method='pa', rotate='none', nobs=nrow(test.data))
#Kruskal-Wallis or Mann-Whitney depending on predictor levels
lapply(test.data[,1:6], function(x) kruskal.test(x~position, data=test.data))
Best Answer
Whatever factor analysis or other multivariate analysis you do on the rankings data you should be aware that the ordered multinomial (no ties) nature of ranking task induces negative correlations in the the data. In your code, for example, you generate 6 variables which are random ranking from 1 to 6. Expected correlations between the variables will be all
-1/(6-1) = -0.2
. (See also, about compsitional data in general>) If these were ratings, the expected correlations would be all0
. I’m speaking of the baseline or background correlations – expected in the absence of substantive factors of interest. The presence of a background general factor does not preclude doing factor analysis, generally, but it may be a complex issue – how to deal with it the best way.