Help creating a formula for a rating system with weighed ranks.

averagemeansstatistics

I am attempting to create a 5 star rating system with a twist.

Instead of calculating how many people rated something 5 stars then adding that to the total 4 stars etc…, I want to do something different.

I want to use a ranking system that returns a rank between 0 and 5. When someone with a higher rank rates someone else, their rating carries more weight than someone with a lower rank giving the same rate.


For example.

  • User A is ranked 4.375
  • User B is ranked 4.0
  • User C is ranked 2.0

User A give User B 5 stars.

That should dramatically increase User B's rank. (ie +0.5%)

If User C gives User B 5 stars.

User B's rank would go up a little. (ie +0.1%)

The percentages are just examples.

Second example

User A gives User B a 1 star rating.

User B's rank will drop significantly. (ie -0.5%)

User C gives User B a 1 star rating.

User B's rank will drop a little. (ie -0.1%)

Again the percentages are purely fictional. And only meant to
represent the idea that someone with a lower rank doesn't really
affect someone with a bigger rank than them.


I tried something like this which is the weighted rank, but instead of whole numbers (how many people rated someone 5 stars), I used the Raters Rank for the Raters Rank

((My Rank  * Raters Rank) + new rating) / (Raters Rank + 1)

I would like to keep at least 3 significant figures and stay clamped between 0 and 5.

I will only be storing the user's rank and a count of how many people have rated a user over time.

No historical data will be kept. As in, you don't know User A rated User B 1.0

Best Answer

You can use a weighted mean $\mu_k$ for individual $k$: $$ \mu_k = \frac{\sum_j r_j \cdot s_{jk}}{\sum_j r_j} $$ where $r_j$ is the rank of individual $j$ and $s_{jk}$ is the number of stars that individual $j$ gave to individual $k$.

If you want the past to be "static" then you may store 2 variables for each individual: $stars_k(t)$ and $rankers_k(t)$, where $stars_k(t) = \sum_t r_{j(t)} \cdot s_{j(t),k}$ is the sum of the stars with the corresponding weights of the individuals $j$ that made their evaluation up to time $t$ and $rankers_k(t) = \sum_t r_{j(t)}$ is the sum of the rankings of the all individuals $j$ that made their evaluation up to time $t$.

Then the weighted mean $\mu_k(t)$ for individual $k$ at time $t$ will be: $$ \mu_k(t) = \frac{stars_k(t)}{rankers_k(t)} $$ If I understood correctly, $\mu_k(t)$ will be used as their rank $r_k(t)$ in the computations for the rankings of the other individuals.

Alternatively, you may use a decay rate on the sums $stars_k(t)$ and $rankers_k(t)$, perhaps something like:

$$stars_k(t) = (1-\delta) \cdot r_{j(t)} \cdot s_{j(t),k} + \delta \cdot stars_k(t-1)$$ and $$rankers_k(t) = (1-\delta) \cdot r_{j(t)} + \delta \cdot rankers_k(t-1)$$ for a value of the decay rate $\delta$ in $(0;1)$, with larger values of $\delta$ meaning that the past is more important in the computation.


Edit: Following the first of your examples:

-- In my initial approach, assume that at time $t$

User B has $stars_B(t) = 28$ and $rankers_B(t) = 7$, thus is ranked $r_B(t)=\mu_B(t)=28/7 = 4.0$.

If, at time $t+1$ User A gives User B $5$ stars, the ranking of User B becomes:

$stars_B(t+1) = 4.375 \times 5 + 28 = 49.875$; $rankers_B(t+1) = 4.375 + 7 = 11.375$ and $r_B(t+1) = \mu_B (t+1) = 49.875 / 11.375 \approx 4.3846$.

However if we had instead $stars_B(t) = 2800$ and $rankers_B(t+1) = 700$, we would have a much smaller impact from $A$: $r_B(t+1) = \mu_B (t+1) = 2821.875 / 704.375 \approx 4.0062$.

-- In my alternative approach, by construction, $stars_k(t)$ has to be between $1 \times 1 = 1$ and $5 \times 5 = 25$ and $rankers_k(t)$ has to be between $1$ and $5$. Let $\delta = 0.8$.

If we assume that $stars_B(t) = 16$ and $rankers_B(t) = 4$, then after User A evaluation we get $stars_B(t+1) = (1-0.8) \times 4.375 \times 5 + 0.8 \times 16 = 17.175$; $rankers_B(t+1) = (1-0.8) \times 5 + 0.8 \times 4 = 4.2$ and $r_B(t+1) = \mu_B (t+1) = 17.175 / 4.2 \approx 4.0893$.

Another example: if we assume that $stars_B(t) = 8$ and $rankers_B(t) = 2$, then after User A evaluation we get $stars_B(t+1) = (1-0.8) \times 4.375 \times 5 + 0.8 \times 8 = 9.175$; $rankers_B(t+1) = (1-0.8) \times 5 + 0.8 \times 2 = 2.6$ and $r_B(t+1) = \mu_B (t+1) = 9.175 / 4.2 \approx 3.5288$.

Note that $\delta = 0.8$ can be interpreted as the past having a weight of $80\%$ and the new evaluation having a weight of $20\%$. If you want to give more importance to the past (and less importance to the more recent evaluations), then $\delta$ should be closer to $1$.

Related Question