Solved – Collaborative filtering and implicit ratings; normalization

I would like to use the time a user spends viewing an article as an implicit rating of how much the user likes the article.

My question is how do I normalize this information across all users.

At the moment, I'm subtracting the time spent by the user-specific mean, and dividing by the standard deviation.

Is this the right way to go about it? It doesn't seem so, as the ratings can still take any values.

Maybe I should scale the ratings into some interval (like [$1$-$10$]) after?

Step #1: Filling in the blanks

From my experience dealing with user dwell time, The amount of users that spend $t$ seconds viewing a site, decreases greatly as $t$ increases.

I found out that modelling user dwell-time as an Exponential curve, is a good approximation.

Using the Bayesian approach, and using the Gamma distribution as the prior distribution on the mean of each site's dwell-time, we get a familiar formula:

$$\frac{n+m}{\frac{m}{b}+\frac{1}{t_1}+\dots++\frac{1}{t_b}}$$

Where $t_i$ is the time spent on site $i$, $b$ is the bias you introduce and $m$ is its strength.

For example, setting $b=3,m=2$ is like assuming two fictional users viewed a site for 3 seconds when we have no data for that userxarticle combination.

And note that this formula is much more immuned to outliers, since it assumes the exponential distribution (and not the Gaussian distribution like the arithmetic mean)

Step #2: Populating the matrix

Times are positive, and they have a certain bounds that make sense (for example, maximum of one day).

However, after the matrix factorization, any numeric value can appear in the matrix cells, including negative terms.

The common practice is to populate the userxarticle matrix with $$logit(t)$$ Where logit is the inverse of the sigmoid function.

And then when interpolating the dwell time for a user $i$ and article $j$, we use:

$$sigmoid(<\vec{u_i},\vec{a_j}>)$$

Instead of only using the dot product.

This way we can be certain that the end result would be bounded to a certain range that makes sense.

Solved – Collaborative filtering and implicit ratings; normalization

Best Answer

Step #1: Filling in the blanks

Step #2: Populating the matrix

Related Question

Best Answer

Step #1: Filling in the blanks

Step #2: Populating the matrix

Related Solutions

Solved – Support Vector Machines and Recommender Algorithms

Related Question