Solved – Matrix Factorization Model for recommender systems how to determine number of latent features

ensemble learningmatrix decompositionparameterizationrecommender-system

I am trying to design a matrix factorization technique for a simple user-item, rating recommender system. I have 2 questions about this.

First in a simple implementation that I saw of matrix factorization technique for movie recommendation, the author just initialized the dimensions of the latent features let's call it K of the two latent feature User and Item matrices, to some constant K let's say 2 and hence each of his latent feature matrices P and Q were N X K and M X K where R is the original user item rating matrix we are trying to approximate with dimensions N X M (N users and M items). So my questions is how do I determine optimal 'K' (number of latent features) in this case instead of just setting it to some constant?

Also is there any way to incorporate user or item information that I already have in my dataset, such as the average rating of a particular user,sex of the user,user_location etc into this result of matrix factorization while making my final recommendation (I guess maybe a blending model with the user and item information represented in some other content-based filtering model along with my matrix factorization model would work?).

1> My first question is how to determine optimal number of latent features K
2> does anyone know of recent literature that implements a blending model of matrix factorization and content-based filtering (because I guess that would be the only way to represent demographic information of users and items in a common feature space.)

Best Answer

In response to your first question, cross validation is a widely used approach. One possible scheme is the following.

For each K value within a pre-selected range, use cross validation to estimate model performance (e.g. prediction accuracy). This will provide one estimated model performance metric per k-value. Then, select the k that corresponds to the highest performance.

In response to your second question, I would look at examples of a 'hybrid approach' e.g. in http://www.stanford.edu/~abhijeet/papers/cs345areport.pdf