Scales – How to Weight a Rating System to Favor Popular Items?

Thanks in advance for bearing with me, I am not a statistician of any kind and don't know how to describe what I'm imagining, so Google isn't helping me here…

I'm including a rating system in a web application I'm working on. Each user can rate each item exactly once.

I was imagining a scale with 4 values: "strongly dislike", "dislike", "like", and "strongly like", and I had planned on assigning these values of -5, -2, +2, and +5 respectively.

Now, if every item was going to have the same number of ratings, then I would be quite comfortable with this scoring system as clearly differentiating the most liked and least liked items. However, the items will not have the same number of ratings, and the disparity between the number of votes on different photos may be quite dramatic.

In that case, comparing the cumulative scores on two items means that an old item with a lot of mediocre ratings is going to have a much higher score than an exceptional new item with many fewer votes.

So, the first obvious thing I thought of us to take an average… but now if an item has only one rating of "+5" it has a better average than an item that has a score of 99 "+5" ratings and 1 "+2" rating. Intuitively that isn't an accurate representation of the popularity of an item.

I imagine this problem is common and you guys don't need me to belabor it with more examples, so I'll stop at this point and elaborate in comments if needed.

My questions are:

What is this kind of problem called, and is there a term for the techniques used to solve it? I'd like to know this so I can read up on it.
If you happen to know of any lay-friendly resources on the subject, I'd very much appreciate a link.
Finally, I'd appreciate any other suggestions about how to effectively collect and analyze this kind of data.

Best Answer

One way you can combat this is to use proportions in each category, which does not require you to put numbers in for each category (you can leave it as 80% rated as "strongly likes"). However proportions do suffer from the small number of ratings issue. This shows up in your example the Photo with 1 +5 rating would get a higher average score (and proportion) than one with the 99 +5 and 1 +2 rating. This doesn't fit well with my intuition (and I suspect most peoples).

One way to get around this small sample size issue is to use a Bayesian technique known as "Laplace's rule of succession" (searching this term may be useful). It simply involves adding 1 "observation" to each category before calculating the probabilities. If you wanted to take an average for a numerical value, I would suggest a weighted average where the weights are the probabilities calculated by the rule of succession.

For the mathematical form, let $n_{sd},n_{d},n_{l},n_{sl}$ denote the number of responses of "strongly dislike", "dislike", "like", and "strongly like" respectively (in the two examples, $n_{sl}=1,n_{sd}=n_{d}=n{l}=0$ and $n_{sl}=99,n_{l}=1,n_{sd}=n_{d}=0$). You then calculate the probability (or weight) for strongly like as

$$Pr(\text{"Strongly Like"}) = \frac{n_{sl}+1}{n_{sd}+n_{d}+n_{l}+n_{sl}+4}$$

For the two examples you give, they give probabilities of "strongly like" as $\frac{1+1}{1+0+0+0+4}=\frac{2}{5}$ and $\frac{99+1}{99+1+0+0+4}=\frac{100}{104}$ which I think agree more closely with "common sense". Removing the added constants give $\frac{1}{1}$ and $\frac{99}{100}$ which makes the first outcome seem higher than it should be (at least to me anyway).

The respective scores are just given by the weighted average, which I have written below as:

$$Score=\begin{array}{1 1} 5\frac{n_{sl}+1}{n_{sd}+n_{d}+n_{l}+n_{sl}+4}+2\frac{n_{l}+1}{n_{sd}+n_{d}+n_{l}+n_{sl}+4} \\ - 2\frac{n_{d}+1}{n_{sd}+n_{d}+n_{l}+n_{sl}+4} -5\frac{n_{sd}+1}{n_{sd}+n_{d}+n_{l}+n_{sl}+4}\end{array}$$

Or more succinctly as

$$Score=\frac{5 n_{sl}+ 2 n_{l} - 2 n_{d} - 5 n_{sd}}{n_{sd}+n_{d}+n_{l}+n_{sl}+4}$$

Which gives scores in the two examples of $\frac{5}{5}=1$ and $\frac{497}{104}\sim 4.8$. I think this shows an appropriate difference between the two cases.

This may have been a bit "mathsy" so let me know if you need more explanation.

Best Answer

Related Solutions

Solved – How to estimate a “confidence” in a score / rating

Solved – General principles for extending the Elo system to games in which the margin of victory matters

Related Question