Solved – Power analysis for inter-rater reliability study (Kappa) with multiple raters

I've spent some time looking through literature about sample size calculation for Cohen's kappa and found several studies stating that increasing the number of raters reduces the number of subjects required to get the same power. I think this is logical when looking at inter-rater reliability by use of kappa statistics. But there is, as far as I can see, no specific calculation or reference for the statement. In this link there is calculation for 2 raters.

Is anyone familiar with similar calculation for several raters?
Other factors that would affect the number of subjects required?

I will (probably) have 5 categories of nominal data. There might be combined findings. There will be 3 raters.

I found this article saying something about sample size and several raters:
Sim, J. and Wright, C. C. (2005) Interpretation, and Sample Size Requirements The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements, Journal of the American Physical Therapy Association, 85, pp. 257–268.

When seeking to optimize sample size, the investigator needs to choose
the appropriate balance between the number of raters examining each
subject and the number of subjects. In some instances, it is more
practical to increase the number of raters rather than increase the
number of subjects. However, according to Shoukri, when seeking to
detect a kappa of .40 or greater on a dichotomous variable, it is not
advantageous to use more than 3 raters per subject—it can be shown
that for a fixed number of observations, increasing the number of
raters beyond 3 has little effect on the power of hypothesis tests or
the width of confidence intervals. Therefore, increasing the number of
subjects is the more effective strategy for maximizing power.

Solved – Power analysis for inter-rater reliability study (Kappa) with multiple raters

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Inter-rater reliability with many non-overlapping raters

Related Question