Solved – Alternating least square formula

linear algebramachine learningmatrixrecommender-system

So I was reading about the alternating least square algorithm used for movie recommendations.

Let's say that:

X – user ratings

Y – movie ratings

The result is computed by this formula:

enter image description here

In the book I am reading about this it is mentioned that at each step we compute:

enter image description here

Why do we use this formula and not $X_i = A_i(Y^T)^{-1}$ where $(Y^T)^{-1}$ is the inverse of $Y$ transposed?

Best Answer

I think you are confusing the rating matrix with the loss function. The goal is to achieve $X_i$ and $Y_i$ such that they minimize this loss function*:

$$ \underset{X,Y}{\operatorname{Argmin}} \big \|R- X Y^T \big \|_F^2\tag{$1$}$$

One of the typical ways to optimize a loss function, is by taking its derivative and setting it to $0$ and solving for the variable you are trying to minimize or maximize.

Taking the derivative of $(1)$ in terms of $X$ (holding $Y$ constant) yields: $$X = RY(Y^T Y)^{-1}\tag{$2$}$$

Taking the derivative of $(1)$ in terms of $Y$ (holding $X$ constant) yields: $$Y = RX(X^T X)^{-1}\tag{$3$}$$

I wish I could show you the math behind this derivation but it can get pretty hairy and is a question in of itself. However, it would be a good exercise to really convince yourself rather than believing it at face value.

Now, alternating least squares works by first assigning random values to matrix $Y$ and solving for $X$ $(2)$. Then fixing $X$ constant, you solve for $Y$ $(3)$. The algorithm iterates through these steps until $X$ and $Y$ converge to a local optimum.

*Note, that I left out the regularization terms accompanied by $\lambda$ for brevity. These terms are almost always part of this formula $(1)$ to combat overfitting.

Related Question