Regression – Objective Function of Canonical Correlation Analysis (CCA) in Regression

canonical-correlationleast squaresreduced-rank-regressionregression

Given two vectors of random variables $X$ and $Y$, Canonical Correlation Analysis (CCA) finds the transformation matrices $A$ and $B$ so that $\operatorname{corr}(A_{1*} X, B_{1*} Y)$ is first maximal, $\operatorname{corr}(A_{2*} X, B_{2*} Y)$ is then maximal subject to $\operatorname{corr}(A_{1*} X, A_{2*} X) = 0$ and $\operatorname{corr}(B_{1*} Y, B_{2*} Y) = 0$, etc.

Is there any global objective function that $A$ and $B$ also optimize? For instance, do they maximize $\sum_i \operatorname{corr}(A_{i*} X, B_{i*} Y)$ subject to $A^TA=I$ and $B^TB=I$, or something around this line?

Related to that, if we define a transformation matrix $W = B^{-1}A$, is there any relation between $WX$ and $Y$ for which $W$ is optimal? In particular, is it possible to establish some connection between this transformation $W$ and the optimization objective of Ordinary Least Squares (OLS)?

Best Answer

If $X$ is $n\times p$ and $Y$ is $n\times q$, then one can formulate the CCA optimization problem for the first canonical pair as follows:

$$\text{Maximize }\operatorname{corr}(Xa, Yb).$$

The value of the correlation does not depend on the lengths of $a$ and $b$, so they can be arbitrarily fixed. It is convenient to fix them such that the projections have unit variances:

$$\text{Maximize }\operatorname{corr}(Xa, Yb) \text{ subject to } a^\top \Sigma_X a=1 \text{ and } b^\top \Sigma_Yb=1,$$

because then the correlation equals the covariance:

$$\text{Maximize } a^\top \Sigma_{XY}b \text{ subject to } a^\top \Sigma_X a=1 \text{ and } b^\top \Sigma_Yb=1,$$

where $\Sigma_{XY}$ is the cross-covariance matrix given by $X^\top Y/n$.


We can now generalize it to more than one dimension as follows:

$$\text{Maximize }\operatorname{tr}(A^\top \Sigma_{XY}B) \text{ subject to } A^\top \Sigma_X A=I \text{ and } B^\top \Sigma_Y B=I,$$

where the trace forms precisely the sum over successive canonical correlation coefficients, as you hypothesized in your question. You only had the constraints on $A$ and $B$ wrong.

The standard way to solve CCA problem is to define substitutions $\tilde A = \Sigma_X^{1/2} A$ and $\tilde B = \Sigma_Y^{1/2} B$ (conceptually this is equivalent to wightening both $X$ and $Y$), obtaining

$$\text{Maximize }\operatorname{tr}(\tilde A^\top \Sigma_X^{-1/2} \Sigma_{XY}\Sigma_Y^{-1/2} \tilde B) \text{ subject to } \tilde A^\top \tilde A=I \text{ and } \tilde B^\top \tilde B=I.$$

This is now easy to solve because of the orthogonality constraints; the solution is given by left and right singular vectors of $\Sigma_X^{-1/2} \Sigma_{XY}\Sigma_Y^{-1/2}$ (that can then easily be back-transformed to $A$ and $B$ without tildes).


Relationship to reduced-rank regression

CCA can be formulated as a reduced-rank regression problem. Namely, $A$ and $B$ corresponding to the first $k$ canonical pairs minimize the following cost function:

$$\Big\|(Y-XAB^\top)\Sigma_Y^{-1/2}\Big\|^2 = \Big\|Y\Sigma_Y^{-1/2}-XAB^\top\Sigma_Y^{-1/2}\Big\|^2.$$

See e.g. Torre, 2009, Least Squares Framework for Component Analysis, page 6 (but the text is quite dense and might be a bit hard to follow). This is called reduced-rank regression because the matrix of regression coefficients $AB^\top\Sigma_Y^{-1/2}$ is of low rank $k$.

In contrast, standard OLS regression minimizes

$$\|Y-XV\|^2$$

without any rank constraint on $V$. The solution $V_\mathrm{OLS}$ will generally be full rank, i.e. rank $\min(p,q)$.

Even in the $k=p=q$ situation there still remains one crucial difference: for CCA one needs to whiten dependent variables $Y$ by replacing it with $Y\Sigma_Y^{-1/2}$. This is because regression tries to explain as much variance in $Y$ as possible, whereas CCA does not care about the variance at all, it only cares about correlation. If $Y$ is whitened, then its variance in all directions is the same, and the regression loss function starts maximizing the correlation.

(I think there is no way to obtain $A$ and $B$ from $V_\mathrm{OLS}$.)