Mathematical Statistics – Why the First Canonical Direction Equals the Left Singular Vector in CCA

canonical-correlationcorrelationeigenvaluesmathematical-statisticssvd

I want to understand why the canonical direction $a$ is equal to the left singular value of $M = \Sigma^{-1/2}_X \Sigma_{X, Y} \Sigma^{-1/2}_Y$ and not $a = \sigma_1 u_1$.

My calculation tell me that it should be $a = \sigma_1 u_1$.

Short Version

The short version is this:

If we need want to maximize:
$$ \max_{\tilde a, \tilde b} \frac{\tilde a^\top M \tilde b}{\sqrt{\tilde a^\top \tilde a^\top} \sqrt{\tilde b^\top \tilde b}} = \max_{\tilde a, \tilde b} corr'( \tilde a, \tilde b)$$
we simply set $\tilde a$ to be colinear to $M \tilde b$ (to have equality due to Cauchy-Schwartz) and $\tilde b = v_1$ (the first right singular vector) to have the Rayleigh quotient equal the upper bound. So using if we plug in $\tilde b = v_1$ to tje equation of $\tilde a = M \tilde b$ (since it's colinear) we get:
$$ \tilde a
= M \tilde b = M \tilde v_1
= (U \Sigma V^\top) v_1
= \left( \sum^{rank(M)}_{r=1} \sigma_i u_i v^\top_i \right) v_1
$$
where $v^\top_i v_1 = 0$ when $i \neq 1$ and 1 when $i=1$ (since the SVD gives a left and right orthonormal basis). Giving:
$$ \tilde a
= \sigma_1 u_1
$$

but this is wrong since professor Maxime Turgeon's derivation says $\tilde a_1 = u_1$ and wikipedia too.

What did I do wrong? Why is my analysis wrong?

Full analysis:

First recall CCA:
$$
a, b = \max_{a, b} corr(a, b) = \max_{a, b} \frac{a^\top \Sigma_{X, Y} b }{
\sqrt{a^\top \Sigma_{X} a }
\sqrt{b^\top \Sigma_{Y} b}
}
$$

Start by considering the change of variables:
$$ \tilde a = \Sigma^{1/2}_X a$$
$$ \tilde b = \Sigma^{1/2}_X b$$
and
$$M= \Sigma^{-1/2}_X \Sigma_{X, Y} \Sigma^{-1/2}_Y$$
then the original optimization problem becomes:
$$ \max_{\tilde a, \tilde b} \frac{\tilde a^\top M \tilde b}{\sqrt{\tilde a^\top \tilde a^\top} \sqrt{\tilde b^\top \tilde b}} = \max_{\tilde a, \tilde b} corr'( \tilde a, \tilde b)$$
We do this so we can apply the Cauchy-Schartz inequality (CWI) and obtain an upper bound to $corr'(\tilde a, \tilde b)^2$.
i.e. we note via the CWI:
$$ \tilde a^\top (M \tilde b) ) \leq \tilde (a^\top \tilde a) (\tilde b M^\top M \tilde b) $$
which allows us to upper bound the numerator of $corr'( \tilde a, \tilde b)$ to get:
$$ corr'( \tilde a, \tilde b)^2
= \frac{ (\tilde a^\top M \tilde b)^2 }{(\tilde a^\top \tilde a^\top)(\tilde b^\top \tilde b)}
\leq \frac{\tilde b M^\top M \tilde b}{ \tilde b^\top \tilde b} = R_{M^\top M}(\tilde b)
= R(\tilde b)
$$
where $R_{M^\top M}(\tilde b)$ denotes the Rayleigh quotient for matrix $M^\top M$ and we don't make the matrix $M^\top M$ going forward and use $R(\tilde b)$ instead.

We recall that the Rayleigh quotient is uppder bounded by the maximum eigenvalue $\lambda_1$ of $M^\top M$ and get the following:
$$ corr'( \tilde a, \tilde b)^2
\leq \frac{\tilde b M^\top M \tilde b}{ \tilde b^\top \tilde b}
= R(\tilde b) \leq \lambda_1
$$

We note that if we have $\tilde a = C M \tilde b$ i.e. $\tilde a$ to be colinear with $M \tilde b$ then the CWI becomes an equality (i.e. the angle between the two vectors is zero so the inner product is maximum).
For $C=1$ we have:
$$ corr'( M \tilde b, \tilde b)^2
= \frac{\tilde b M^\top M \tilde b}{ \tilde b^\top \tilde b}
= R(\tilde b) \leq \lambda_1
$$

Now what we want is to make the next inequality and equality since that would match the largest possible value for the correlation we are seeking to maximize.
We note this is achieved when we choose $\tilde b$ to be the eigenvector of $M^\top M$.
Equivalently, this is (first) right singular vector of $M$ denoted by $v_1$.
Thus the last inequality becomes an equality as follows when $\tilde b = v_1$:
$$ corr'( M \tilde b, \tilde b)^2
= \frac{\tilde b M^\top M \tilde b}{ \tilde b^\top \tilde b}
= R(v_1) = \lambda_1
$$
This choice of $\tilde b = v_1$ completes the justification for $b_1$ when we simply undo the change of variables $\tilde v_1 = \Sigma^{1/2}_Y b$ i.e. $b_1 = \Sigma^{-1/2}_Y v_1$.

To get the top canonical correlation we take the square root of the eigenvalue $\lambda_1$ since we uppder bounded the square i.e. $\rho_1 = \sqrt{ \lambda_1}$.
Equivalently, $\rho_1 = \sigma_1 = \sqrt{ \lambda_1}$ (i.e. use the first singular value of $M$) using the standard relation of the SVD.

To get $a_1$ we first get an expression for $\tilde a = M \tilde b$.
We plug in $\tilde b = v_1$ and use the SVD of $M$:
$$ \tilde a
= M \tilde b = M \tilde v_1
= (U \Sigma V^\top) v_1
= \left( \sum^{rank(M)}_{r=1} \sigma_i u_i v^\top_i \right) v_1
$$
where $v^\top_i v_1 = 0$ when $i \neq 1$ and 1 when $i=1$ (since the SVD gives a left and right orthonormal basis). Giving:
$$ \tilde a
= \sigma_1 u_1
$$
To get $a_1$ we also invert the change of basis and plug in the above value.
Inverting the change of basis $\tilde a_1 = \Sigma^{1/2}_X$ we get $ a_1 = \Sigma^{-1/2}_X \tilde a$.
Plugging in $\tilde a = \sigma_1 u_1$ gives us $a_1 = \sigma_1 \Sigma^{-1/2}_X u_1$ as required (except there is an extra singular value for some reason! noooo).

cross-posted/refs:

Best Answer

In short, both results are correct.

The goals of CAA is to maximise the correlation between $\vec{a}$ and $\vec{b}$. Now this on its own is actually not a well defined maximisation problem, or rather, it does not have a unique solution, because the correlation is invariant under re-scaling: $$ corr(\vec{a}, \vec{b}) = corr( \alpha \, \vec{a}, \beta \, \vec{b})$$ where $\alpha$, $\beta$ can be any positive numbers. You can check that by plugging in the definition ... $\alpha$, $\beta$ simply cancel out between the numerator and denominator.

Applied to CAA here, this means that the optimal solution really is correctly stated as $\tilde{b} \propto v_1 $ and $\tilde{a} \propto u_1$. In words, $\tilde{a}$ should be proportional to $u_1$ and $\tilde{b}$ should be proportional to $v_1$.

Any such pair maximises the correlation. You can check the value $$corr( \alpha \, u_1, \beta \, v_1) = \frac{(\alpha \, u_1^T) M (\beta \, v_1)}{\sqrt{(\alpha \, u_1^T)(\alpha \, u_1)}\sqrt{(\beta \, v_1^T)(\beta \, v_1)}} = \sigma_1 $$ where we used $|u_1| = |v_1| = 1$. Again, you see that the parameters $\alpha$, $\beta$ drop out.

All in all this means you simply get to pick any $\alpha$, $\beta$ you like! They're all equally valid solutions to the maximisation problem.

That said, typically one imposes an additional constraint on the solutions to pick a unique solution and declares that the canonical solution. That choice is typically something that seems natural, like say requiring unit length ($|\vec{a}| = |\vec{b}| = 1$) or the like.

It's important to keep in mind though that the canonical solution is not more "correct" than all the other solutions, we simply pick one as the preferred solution as a convention.

Now, coming back to your solution, you can check that at some point you jumped from

"$\tilde{a}$ needs to be collinear to $M \tilde{b}$"

which is unequivocally correct, to

"$\tilde{a} = M \tilde{b}$"

which is a particular choice (neither right nor wrong). You could've equally said $\tilde{a} = 42 \,M \tilde{b}$ ;)

To find the canonical result, you could instead say

$\tilde{a} = \lambda \, M \tilde{b}$ where $\lambda $ is TBD
fix $\lambda$ by imposing $|\tilde{a}| = 1$
that yields $\lambda = 1 / \sigma_1$, and thus $\tilde{a} = u_1$

Full analysis:

Best Answer

Related Solutions

Solved – score function of bivariate/multivariate normal distribution

Solved – How to interpret results from Canonical Correlation Analysis (CCA)

Related Question