Statistics – How to Calculate ‘Most Popular’ More Accurately

averagestandard deviationstatistics

I'm developing a website at the moment.

The website allows users to "rate" a post from 0 to 5.

Posts can then be displayed in order of popularity.

At the moment, my method of calculation is pretty primitive:

average_rating = total_rating/ratings

the problem is that a story with 1 rating of 5 is more popular than a story with 99 ratings of 5 and 1 of 4.

(5/1) > (499/100)

Could someone suggest a more accurate way to calculate popularity both on the number of votes and the quality of each vote?

Best Answer

A standard procedure (frequently -and loosely- called 'bayesian average') is to make a weighted average between the individual rating and the 'a priori' rating:

$R_a = W \; R + (1 - W ) \; R_0$

where

$R_a = $ averaged ('bayesian') rating

$R = $ individual rating: average rating for this item.

$R_0 = $ a priori rating: global average rating, for all items in your database.

$W = $ weight factor: it should tend to $0$ if this items has few votes, and it should tend to $1$ if it has many.

Some choices: $W = \frac{n}{N_{max}}$, or $W = max( \alpha \frac{n}{N_{av}},1)$ , etc ($n=$ number of votes for this item, $N_{max}=$ maximum number of votes for all items, $N_{av}=$average, $\alpha=$ some number between 0.5 and 1... ) Also, frequently one discards items that have very low/big values when computing the statistics.

See some examples

Added: for another approach, specially for yes/no like/diskike votes, see here.

Related Question