Cosine similarity of concatenated vectors

inequalityinner-productsmatricesvectors

I have 4 vectors:

$a_1 = [a_{1,0}, a_{1,1}, \ldots, a_{1,n}]$
$b_1 = [b_{1,0}, b_{1,1}, \ldots, b_{1,n}]$
$a_2 = [a_{2,0}, a_{2,1}, \ldots, a_{2,n}]$
$b_2 = [b_{2,0}, b_{2,1}, \ldots, b_{2,n}]$

Let $X$ be the cosine similarity between $a_1$ and $a_2$, $Y$ the cosine similarity between $b_1$ and $b_2$, and Z be the cosine similarity between $concat(a_1, b_1)$ and $concat(a_2, b_2)$.

To be clear, $concat(a_1, b_1) = [a_{1,0}, a_{1,1}, \ldots, a_{1,n}, b_{1,0}, b_{1,1}, \ldots, b_{1,n}]$ and $concat(a_2, b_2) = [a_{2,0}, a_{2,1}, \ldots, a_{2,n}, b_{2,0}, b_{2,1}, \ldots, b_{2,n}]$.

My question is, is there a relationship one could derive between $X$ and $Y$, and $Z$? Is there some function $f$ such that $f(X, Y) = Z$?

I know X is $\frac{a_1 \cdot a_2}{\lVert a_1\rVert\lVert a_2\rVert}$ and Y is $\frac{b_1 \cdot b_2}{\lVert b_1\rVert\lVert b_2\rVert}$ and the numerator of $Z$ is the sums of the numerators of $X$ and $Y$, but I'm not sure how to proceed.

Best Answer

Unfortunately you can say rather little about $Z$ knowing only $X$ and $Y$.

First, you can make $Z$ arbitrarily close to, say, $X$, by making the $b$ vectors arbitrarily small (with fixed $Y$) so they don’t contribute to $Z$.

Then one might hope that at least $Z$ must lie between $X$ and $Y$, but not even that is the case. You can make $Z$ arbitrarily small for arbitrary $X$ and $Y$ by multiplying $b_1$ by $\lambda^{-1}$ and $b_2$ by $\lambda^2$ for $\lambda\to0$ (which doesn’t change $Y$), so that the $b$ vectors contribute arbitrarily little to the numerator of $Z$ but arbitrarily much to the denominator.

The only thing you can say is that $Z$ can’t lie “beyond” both $X$ and $Y$ (as seen from $0$), i.e.

$$Z\in[\min(0,X,Y),\max(0,X,Y)]\;.$$

To show this, write $Z$ as a linear combination of $X$ and $Y$ (where I’ll write $ab$ for the concatenation):

\begin{eqnarray} X &=& \frac{a_1\cdot a_2}{\|a_1\|\|a_2\|}\;, \\ Y &=& \frac{b_1\cdot b_2}{\|b_1\|\|b_2\|}\;, \\ Z &=& \frac{ab_1\cdot ab_2}{\|ab_1\|\|ab_2\|} \\ &=& \frac{a_1\cdot a_2+b_1\cdot b_2}{\|ab_1\|\|ab_2\|} \\ &=&\frac{\|a_1\|\|a_2\|}{\|ab_1\|\|ab_2\|}\cdot X+\frac{\|b_1\|\|b_2\|}{\|ab_1\|\|ab_2\|}\cdot Y \;. \end{eqnarray}

The two coefficients are positive, and their sum can’t be greater than $1$:

$$ \frac{\|a_1\|\|a_2\|}{\|ab_1\|\|ab_2\|}+\frac{\|b_1\|\|b_2\|}{\|ab_1\|\|ab_2\|}\gt1 \\ \implies \|a_1\|\|a_2\|+\|b_1\|\|b_2\|\gt\|ab_1\|\|ab_2\| \\ \implies (\|a_1\|\|a_2\|+\|b_1\|\|b_2\|)^2\gt\|ab_1\|^2\|ab_2\|^2=(\|a_1\|^2+\|b_1\|^2)(\|a_2\|^2+\|b_2\|^2) \\ \implies 2\|a_1\|\|a_2\|\|b_1\|\|b_2\|\gt\|a_1\|^2\|b_2\|^2+\|a_2\|^2\|b_1\|^2\;, $$

which is false due to the AM-GM inequality for $\|a_1\|^2\|b_2\|^2$ and $\|a_2\|^2\|b_1\|^2$.

Thus, $Z$ is a convex combination of $X$ and $Y$ scaled by a positive factor not greater than $1$, and thus can’t lie “beyond” $X$ and $Y$.

Related Question