Solved – Mean Percentage Ranking in implicit feedback ALS

recommender-system

What is Mean Percentage Ranking in implcit feedback recommendation systems. Why should it be less than 50%? There are vague definitions in many forums. But, no clear cut examples. Can someone explain me the concept?

My understanding:

enter image description here

Are the above calculations rights? Is this how you calculate MPR?

Best Answer

Say that you are using implicit feedback recommendation systems, where you observe the number of page visits on the $i$-th items page by $u$-th user $r_{ui}$, let's define a function that tells us that the user visited the page at least once

$$d_{ui} = \begin{cases} r_{ui} > 0 & 1 \\ r_{ui} = 0 & 0 \end{cases}$$

then you use your recommender system to make ranked predictions, where $\text{rank}_{ui} = 0\%$ is the most preferred item, and $\text{rank}_{ui} = 100\%$ is the least preferred item. To achieve this, you take any recommender system, that predicts some kind of scores $\hat r_{ui}$, you sort the observations by the scores, and assign the $1/n\times 100\%,2/n\times 100\%,\dots,n/n\times 100\%$ the ordering-based ranks to them. Then MPR is defined as

$$ \text{MPR} = \frac{\sum_{ui} d_{ui} \times \text{rank}_{ui} }{\sum_{ui} d_{ui}} $$

so this is the average rank given to the items that were actually visited by the user. You want your recommender system to predict low ranks for the items that are visited by users, and high for the ones that are not visited by the users (why should you recommend something that does not interest the users?). What follows, if you scrambled the ranks uniformly at random, then for any given user-item pair, the ranks could get any value in $0\%$ to $100\%$ interval, with mean $50\%$. So your recommender needs to be better then assigning the ranks at random.

For more details, you can check the papers Li et al (2008) Improving One-Class Collaborative Filtering by Incorporating Rich User Information and Hu et al (2008) Collaborative Filtering for Implicit Feedback Datasets.

Related Question