Frobenius norm with zero values corresponds to the sum of squared errors with missing values? Why

linear algebramachine learning

I am following a google crash course on recommendation systems. The subject is matrix factorization. Let $A\in\mathbb{R}^{m\times n}$ be the feedback matrix with some values of 1 and some missing values, and let $U\in\mathbb{R}^{m\times d}, V\in\mathbb{R}^{n\times d}$ be the factorization matrices for some latent feature space of size $d$. The first objective function to minimize which comes to mind follows:

$$\min_{U\in\mathbb{R}^{m\times d}, V\in\mathbb{R}^{n\times d}}=\sum_{(i,j)\in obs}(A_{ij}-\langle U_i,V_j\rangle)^2$$

Here, we commit the missing values in the process. It is stated that in such a method of calculation, an all 1 matrix (the dot product of $U$ and $V^T$) will produce an error of zero and a poor generalization capabilities. Hence, they state that it is possible to set the missing values to $0$ which corresponds to minimizing the squared Frobenius distance:

$$\min_{U\in\mathbb{R}^{m\times d}, V\in\mathbb{R}^{n\times d}}=\|A-UV^T\|_F^2$$

How do these two correspond with each other?

Best Answer

They mention in the text before this is that $[U^TV]_{ij}=\langle U_i,V_j\rangle$ The other observation is that $\|A\|_F^2=\sum_{ij}A_{ij}^2$. In particular, they are minimizing $$\|A-U^TV\|_F^2=\sum_{i,j=1}^N(A_{ij}-\langle U_i,V_j\rangle)^2,$$ which is just the first problem without the restriction to $i,j\in obs.$

Related Question