Solved – table to interpret “how good” was the Kendall’s coefficient of concordance (W)

correlationeffect-sizeinterpretationordinal-data

As it is known, Kendall's coefficient of concordance (W) indicates the degree of association of ordinal assessments made by multiple appraisers when assessing the same samples.

Kendall's coefficient values can range from 0 to 1. The higher the value of Kendall's, the stronger the association. Usually Kendall's coefficients of 0.9 or higher are considered very good. A high or significant Kendall's coefficient means that the appraisers are applying essentially the same standard when assessing the samples.

Kappa, another statistic, measures the degree of agreement of the nominal or ordinal assessments made by multiple appraisers when assessing the same samples. Kappa values range from -1 to +1. The higher the value of kappa, the stronger the agreement. Not everyone would agree about whether, e.g., 0.57 constitutes “good” agreement.

Here is one possible interpretation of Kappa (1).

Poor agreement = Less than 0.20
Fair agreement = 0.21 to 0.40
Moderate agreement = 0.41 to 0.60
Good agreement = 0.61 to 0.80
Very good agreement = 0.81 to 1.00

It turns out that, using this scale, a kappa of 0.57 is in the “moderate” agreement range between two observers.

My question is: similar to Kappa, is there some "scientifically recognized" table to help visualize/interpret Kendall's W?

References

(1) Landis JR, Koch GG. The measurement of observer agreement for categorical
data. Biometrics 1977;33:159-74.

Best Answer

Your cited paper itself calls those divisions "clearly arbitrary." Therefore they're no more scientific than any other partitioning you might like to make and are used merely for discussion. If you want to follow the model of that paper just make a partitioning yourself for discussion.

A scientific partitioning properly applicable to every situation could not exist anyway. If I were measuring agreement in ranking of wines I would expect it to be a much lower value than the ranking of observed lengths. I would therefore consider numbers remarkably high among wine tasters potentially very low among length raters.

Whatever is large or small is going to be domain specific and it's up to you to know your domain. If no one within your domain has proposed what are large and small degrees of agreement then propose some of your own. This kind of qualitative assessment has to start somewhere and it must be within your domain. There is no generally applicable answer.