Solved – How to measure the stability / consistency of a ranking

I am looking for the correct statistical method to answer the problem below. I have a method, but I am convinced there is a better way.

The scenario:

I have $N_x$ categorical independent variables. For each categorical variable $x$, there are $N_r$ repetitions, resulting in $N_r \times N_x$ continuous variables $y$.
My goal is to establish a ranking of the $x$ variables according to $y$.

I would like to demonstrate that such a ranking is stable / consistent. I.e. the ideal result would be that the ranking is identical with infinite repetitions. The aim of this question is to be able to compare the consistency of different scenarios.

The brute force solution:

Combine each y (with differing x) to get every possible ranking. This gives $N_{r} ^ {N_{x}}$ possible rankings. For each pair of rankings, find the correlation coefficient (Spearman's rho, Kendall Tau, …). Find the mean and standard deviation of these correlation coefficients.

This is obviously completely impractical with e.g. 10 repeats and 50 categorical variables. It is possible to sample the possible rankings, and then estimate the mean and standard deviation just of that sample, but that seems like throwing data away.

Alternatives

I've found How to measure the reliability of a consensus ranking (problem from Kemeny-Snell book) , but after reading the response I'm not really any wiser.

Best Answer

For this purpose, you can use Kruskal-Wallis to test if there is any ranking that can be considered. https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance

If it is the case, you can use Dunn's test for pair-wise comparisons.

Info: https://stats.stackexchange.com/tags/dunn-test/info
R implementation: https://cran.r-project.org/web/packages/dunn.test/dunn.test.pdf
Python implementation: http://codegist.net/snippet/python/dunnpy_farfan92_python
Stata implementation: https://alexisdinno.com/stata/dunntest.html

These comparisons can form the ranking/ordering you expect.

If you assume your $y$ to be normal with the same variance in each group, then ANOVA and t-tests can be used.

Note that the ranking/ordering may not be linear as some pairwise comparison may exhibit differences that are not statistically significant, as e.g. Bob and Jane in the following example: http://www.quality-assurance-solutions.com/images/ANOVA-3.jpg

Best Answer

Related Solutions

Solved – How to measure the reliability of a consensus ranking (problem from Kemeny-Snell book)

Solved – Analyzing Ranked Data: Correlation and Factor Analysis

Related Question