Solved – Matrix factorization using stochastic gradient descent

gradient descentmatrix decompositionstochastic gradient descent

I am trying to approximate a square matrix $A \in \mathbb{R}^{n \times n}$ using the following matrix factorization

$$A \approx \hat{A} = Z \Lambda^{-1}Z^T$$

where $\Lambda=\text{diag}(Z^T \mathbb{1})$ and $Z$ is sparse. I would like to use stochastic gradient descent (SGD) in order to achieve that

$$\arg \underset{\hat{A}}{\min} \| A – \hat{A} \|$$

So, basically, I would compute some entries of $A$ (because $A$ is a huge matrix) and use these to factorize the matrix. But I am not very sure about the way to do it.

Best Answer

The basic algorithm goes like this:

  1. Start with randomly initialized values of $Z$ and $\Lambda^{-1}$
  2. Go through each value $A_{ij}$
  3. Compute the gradient of $||A_{ij} - \hat{A}_{ij}||$
  4. Update the parameters $Z, Z^T$ and $\Lambda^{-1}$ in the direction of the gradient using a step size $\eta$. It's a hyper-parameter and needs to be learned using cross validation. $Z_i = Z_i - \nabla(||A_{ij} - (Z_{ij}^2 \times \Lambda_{jj} \times Z_i)||)$

In step 3 you may want to add regularization to prevent overfitting. Take a look at the pseudo code on Wikipedia.