Solved – Judgement score regularization problem

normalizationregularization

Consider the scenario where M performances (eg. singing contest) are being judged by N judges. Each judge awards a score S(m,n) to each performance on a scale of one to one-hundred.

The problem occurs when each judge has an individual style of using the scale. For example Judge A might give the worst performances a minimum of 30 and the best a maximum of 90, while Judge B may handout the minimum as 10 and the maximum as 80.

How do we regularize the scores so as the get the overall correct score? Some justification (link to an article or paper) for the answer would be appreciated.

This is similar to https://stats.stackexchange.com/questions/138973/normalizing-interview-scores, but that question has received no responses.

Best Answer

It seems like your goal is to ensure that the center and spread of the scores match up so that all centers are the same (i.e. judge a average = judge b average) and all spreads are the same. However, you want the judges scores' to retain their original shapes - that is, if one score was very high and all others were very low, you want to keep that "ranking" and make sure that the top score is still far higher than the very low scores.

This seems to be the perfect situation in which you use standardization. For each individual $i$, performance $m$, judge $n$, you should calculate:

$z_{imn} = \frac{S_i(m,n) - \bar{S}(m,n)}{\sigma(S(m,n))}$

where $z_{imn}$ is the standardized score, $S_i(m,n)$ is the score of individual $i$ for performance $m$ by judge $n$, $\bar{S}(m,n)$ is the average score by judge $n$ for performance $m$, and $\sigma(S(m,n))$ is the standard deviation of the scores by judge $n$ for performance $m$. This will standardize the means and standard deviations such that the mean for each judge will be 0 and the standard deviation for each judge will be 1, but the shapes of the distributions of scores will be retained to account for situations like the "one very high, all others very low" situation detailed above. You would then report the $z_{imn}$ as the standardized scores and use these for your analysis.

Related Question