I am trying to approximate a square matrix $A \in \mathbb{R}^{n \times n}$ using the following matrix factorization
$$A \approx \hat{A} = Z \Lambda^{-1}Z^T$$
where $\Lambda=\text{diag}(Z^T \mathbb{1})$ and $Z$ is sparse. I would like to use stochastic gradient descent (SGD) in order to achieve that
$$\arg \underset{\hat{A}}{\min} \| A – \hat{A} \|$$
So, basically, I would compute some entries of $A$ (because $A$ is a huge matrix) and use these to factorize the matrix. But I am not very sure about the way to do it.
Best Answer
The basic algorithm goes like this:
In step 3 you may want to add regularization to prevent overfitting. Take a look at the pseudo code on Wikipedia.