Solved – Calculating Cosine Similarity with Matrix Decomposition (matrix multiplication with normalized columns)

matrixmatrix decompositionrecommender-system

To calculate the column cosine similarity of $\mathbf{R} \in \mathbb{R}^{m \times n}$, $\mathbf{R}$ is normalized by Norm2 of their columns, then the cosine similarity is calculated as

$$\text{cosine similarity} = \mathbf{\bar{R}}^\top\mathbf{\bar{R}}.$$

where $\mathbf{\bar{R}}$ is the normalized $\mathbf{R}$,

If I have $\mathbf{U} \in \mathbb{R}^{m \times l}$ and $\mathbf{P} \in \mathbb{R}^{n \times l}$ defined as

$$\mathbf{R} = \mathbf{U}\mathbf{P}^\top $$

where $l$ is the number of latent values. To calculate the similarity, multiply them and use the above equation.

But if $m \gg n$ and $m, n \ggg l$, it's very inefficient. So I tried the flowing expansion:

$$ \mathbf{R}^\top\mathbf{R}=\mathbf{P}\mathbf{U}^\top\mathbf{U}\mathbf{P}^\top
$$
$$\mathbf{U}^\top\mathbf{U} \in \mathbb{R}^{l \times l} = \mathbf{L} \mathbf{L}^* \text{by cholesky decomposition}
$$
$$ \mathbf{R}^\top\mathbf{R}=\mathbf{P}\mathbf{L}\mathbf{L}^\top\mathbf{P}^\top := \mathbf{S}^\top\mathbf{S}.
$$

where $\mathbf{S}=(\mathbf{P}\mathbf{L})^\top \in \mathbb{R}^{l \times n}$.

Now, analogous to the above, $\mathbf{S}$ is normalized by Norm2 their columns, then the similarity would be

$$\text{cosine similarity} = \mathbf{\bar{S}}^\top\mathbf{\bar{S}}.$$

where $\mathbf{\bar{S}}$ is the normalized $\mathbf{S}$.

My question is two results are the same?

$$\mathbf{\bar{R}}^\top\mathbf{\bar{R}} \approx \mathbf{\bar{S}}^\top\mathbf{\bar{S}}$$

I think because the normalizing columns are different, it will be different. But I tried toy example(using Python/numpy), they are the same. Is it correct?

import numpy as np

u = 10000 # users
p = 1000  # products
l = 100   # latents

U = np.random.randn(u, l) # user feature
P = np.random.randn(p, l) # product feature
R = np.dot(U, P.T) # user-product interation

Rbar = R / np.linalg.norm(R, 2, 0)

direct_similarity = np.dot(Rbar.T, Rbar)

L = np.linalg.cholesky(np.dot(U.T, U))
PL = np.dot(P, L)

PLbar = (PL.T / np.linalg.norm(PL, 2, 1)).T

indirect_similarity = np.dot(PLbar, PLbar.T)

np.allclose(direct_similarity, indirect_similarity)
>>> True

Best Answer

The normalizing factors are the same.

Let us take two matrices $R \in \mathbb{R}^{m\times n}, S \in \mathbb{R}^{l \times n}$ such that $R^TR = S^TS$. Now, $\bar{R}$ can be obtained by multiplying $R$ with an appropriate diagonal matrix $D_R$ from the right: $\bar{R} = RD_R$, where $$D_R^2 = \sum_{k=1}^n I_n^{(k)} R^TRI_n^{(k)},$$ where $I_n^{(k)}$ is an $n\times n$ matrix with zeros everywhere, except for the element at $(k,k)$, where it has a 1. Since $R^TR = S^TS$, we have $D^2_R = D^2_S$, thus (being diagonal matrices with non-negative elements) $D_R = D_S$.

Hence, $$\bar{R}^T\bar{R} = (RD_R)^T(RD_R) = D_RR^TRD_R = D_SS^TSD_S = \bar{S}^T\bar{S}.$$