Solved – PCA is to CCA as ICA is to

canonical-correlationindependent component analysispca

PCA looks for factors in data that maximize explained variance. Canonical correlation analysis (CCA), as far as I understand, is like an PCA but looks for a factors that maximize cross covariance between two data sets. So find pca like factors, that are common to two data sets.

Independent component analysis (ICA) is simillar to PCA, but it looks for factors that are statistically independent. Which result in to, in some way, more interpretable factors. E.g gene pathways, brain networks, parts of faces. Or you can say it would identify independent sources that are mixed to produced the data.

Is there a method, that is similar to ICA, as PCA is to CCA? So that would find independent components common to two datasets? Would the results actually make sense?

Best Answer

The first step of ICA is to use PCA and project the dataset into a low-dimensional latent space. The second step is to perform a change of coordinates within the latent space, which is chosen to optimize a measure of non-gaussianity. This tends to lead to coefficients and loadings that are, if not sparse, then at least concentrated within small numbers of observations and features, and that way it facilitates interpretation.

Likewise, in this paper on CCA+ICA (Sui et al., "A CCA+ICA based model for multi-task brain imaging data fusion and its application to schizophrenia"), the first (see footnote) step is to perform CCA, which yields a projection of each dataset into a low-dimensional space. If the input datasets are $X_1$ and $X_2$, each with $N$ rows=observations, then CCA yields $Z_1 = X_1W_1$ and $Z_2 = X_2W_2$ where the $Y$'s also have $N$ rows=observations. Note that the $Y$'s have a small number of columns, paired between $Y_1$ and $Y_2$, as opposed to the $X$'s, which may not even have the same number of columns. The authors then apply the same coordinate-changing strategy as is used in ICA, but they apply it to the concatenated matrix $[Z_1 | Z_2]$.

Footnote: the authors also use preprocessing steps involving PCA, which I ignore here. They are part of the paper's domain-specific analysis choices, rather than being essential to the CCA+ICA method.

Related Question