Solved – How to adjust average rating for sample size on rating systems with more than two categories

confidence intervalmeansample-size

After reading How Not To Sort By Average Rating which deals with confidence interval for a Bernoulli parameter how would you extend it to more than two levels?

For example: Items are scored between 1 and 5 (1 is worst 5 is best). What is the best way to adjust the average score per item in order to take into account the number of scores it received (one 5 score should not give it an average of 5!)?

Shame on you Amazon!

Best Answer

One way to cast your problem would be to treat it as a bayesian estimation problem.

Basically this means having a prior on your mean and update the mean based on each new observation over time.

A practical, yet theoretically disputable way to achieve this is to compute the mean as a function of the mean found in the corpus and the actual observations you have for this item. More precisely, in the recommender system setting, this could mean that you initialize the mean to the mean of the category of the item you're dealing with (in your example "statistics books" probably) and then update it each time a user gives a rating to this particular item.

You can design a clever update rule that has statistical foundations or rely on common sense to quickly produce a basic update rule like this one:

X : item 
r_X^i : i-th rating for item X  
C : all item in the same category as X, discarding empty ratings
mean_C = (1/|C|) * sum_{c in C} sum_{i} (r_c^i)
# when no rating => use category mean
mean_X^0 = mean_C 
# when j ratings => ponderate category mean with actual ratings
mean_X^j = (1/n+1)(mean_C + sum_{i=1..n}(r_X^i)) 

When dealing in general with this kind of problems I recommend reading the work of Koren et al on the Netflix challenge. They grabbed quite a bit of performance by using unsupervised learning on user and content variables - the idea of using the category mean being a similar, yet naive cousin.

Related Question