Solved – How to interpret results from Canonical Correlation Analysis (CCA)

canonical-correlation

I am learning CCA following an example posted here: https://stats.idre.ucla.edu/r/dae/canonical-correlation-analysis/

I have questions regarding on how to interpret the canonical coefficients, canonical loadings, and the figures from the statistical test. The original article was vague about these and does not answer my questions. So here I adapted the example as below. I understand that I have many questions here, but answers to any one question are equally highly appreciated!

Data: Two sets of variables exactly as the example shown on the page:

X as psychology traits (control, concept, motivation)
Y as academic achievements (read, write, math, science, sex)

Canonical Coefficients calculated by R:

$xcoef
              [,1]    [,2]    [,3]
Control    -1.2538 -0.6215 -0.6617
Concept     0.1000 -1.1877  0.8267
Motivation -1.2624  2.0273  2.0002

$ycoef
             [,1]      [,2]      [,3]
Read    -0.044621 -0.004910  0.021381
Write   -0.035877  0.042071  0.091307
Math    -0.623417  0.004229  0.009398
Science -0.000125 -0.085162 -0.109835

Canonical Loadings calculated by R:

$corr.X.xscores
               [,1]    [,2]    [,3]
Control    -0.90405 -0.3897 -0.1756
Concept    -0.02084 -0.7087  0.7052
Motivation -0.56715  0.3509  0.7451

$corr.Y.yscores

            [,1]     [,2]    [,3]
Read    -0.8404 -0.35883  0.1354
Write   -0.8765  0.06484  0.2546
Math    -0.7639 -0.29795  0.1478
Science -0.0564 -0.67680 -0.2304

Interpretation based on the cited article: Let's focus on the canonical variate pair one. From the article, I understand that, using the 'Science' variable as an example:

canonical coefficient: every 1 change of Science leads to -0.000125 change of the first canonical variate in set 2
canonical loading: Science appears to have almost negligible correlation with the first canonical variate in set 2, as its correlation with the variate is very low (-0.0564).

QUESTION 1: If these readings make sense, I wonder what do they actually 'mean'? Does it make sense to say that in this case, comparatively, 'Science' is NOT a useful variable for studying the relation between the Set 1 and Set 2 variables? The reason is: 1) compared to the other variables in Set 2 (Read,Write,Math,Sex), its contribution to the first canonical variate – as indicated by its canonical coefficient (-0.000125) – is orders of magnitude lower; and 2) changes in its value almost do not lead to noticeable change in the first canonical variate – as indicated by its canonical correlation with the first variate.

QUESTION 2: Now ignoring 'Science', how do we interpret the relation between the remaining Set 2 variables and Set 1 variables? Does it make sense to say that 'Math' appears to be the main affecting variable among Set 2 because it has the highest coefficient among all variables, and generally Set 2 variables seem to relate to one's ability in Control and Motivation? And it seems the relation is positive, i.e., the higher we score on math, the better we are at Control and Motivation?

The article the goes on to measure the statistical significance of each canonical variate pair, and calculates scores as below:

      WilksL      F df1  df2         p
 [1,] 0.7544 11.716  15 1635 7.498e-28
 [2,] 0.9614  2.944   8 1186 2.905e-03
 [3,] 0.9892  2.165   3  594 9.109e-02

with an explanation 'the first test of the canonical dimensions tests whether all three dimensions are significant (they are, F = 11.72), the next test tests whether dimensions 2 and 3 combined are significant (they are, F = 2.94). Finally, the last test tests whether dimension 3, by itself, is significant (it is not)'. This is where I am totally lost.

First, why is the first test for all three dimensions (i.e., canonical variate pairs), not the first dimension? And the second test for dimensions 2 and 3 combined, not just the second dimension? What does 'combined' mean and why do we measure them combined, not each individual dimension?

Second, what an value of F means it is significant? The conclusion based on F seems to make no sense, because the first F=11.7, second F=2.9, both are 'significant' while the difference between them is almost 9; the third F=2.165 which is only 0.735 less than the second F, yet the author says it is insignificant. How do we interpret F? How much is 'large'? In my opinion, the p-value makes more sense here because the last p-value>0.05.

Third, linked to the second question, how should we interpret the significance test overall, with all these three metrics (Wilks, F, p-value)? Do they all have to indicate 'significance' or does only one suffice?

Best Answer

I think you have the loading and the coefficients mixed. The coefficient is the correlation of a variable with the dimension. However the loadings affect their own canonical vector. The reason is in the source code (R):

    function (X, Y, res) 
    {
        X.aux = scale(X, center = TRUE, scale = FALSE)
        Y.aux = scale(Y, center = TRUE, scale = FALSE)
        X.aux[is.na(X.aux)] = 0
        Y.aux[is.na(Y.aux)] = 0
        xscores = X.aux %*% res$xcoef
        yscores = Y.aux %*% res$ycoef
        corr.X.xscores = cor(X, xscores, use = "pairwise")
        corr.Y.xscores = cor(Y, xscores, use = "pairwise")
        corr.X.yscores = cor(X, yscores, use = "pairwise")
        corr.Y.yscores = cor(Y, yscores, use = "pairwise")
        return(list(xscores = xscores, yscores = yscores, 
            corr.X.xscores = corr.X.xscores, 
            corr.Y.xscores = corr.Y.xscores, corr.X.yscores = 
              corr.X.yscores, 
            corr.Y.yscores = corr.Y.yscores))
    }

So you can see that the corr.X.xscores is the correlation of the dimension xscores with the original variables (X) (this is from the compute function, called by rcc that it is called by the cc of the CCA package).

Related Solutions

Solved – Canonical Correlation analysis without raw data (algebra of CCA)

If you don't have the original casewise data but know the correlations (and hopefully the variances and the sample size) you may simply generate random data having those correlations and analyze that dataset as usual by the canonical correlations program that take in raw data. This way, every output will be correct except the computation of canonical variates' values - for this would need the true data you don't have.

But anyway, if you want to program canonical correlation analysis (CCA) youself, here is a step-by-step algorithm for you. You may use any language having basic linear algebra matrix functions.

Let $\bf R_1$ be correlations (or covariances) in Set1 of $p_1$ variables. $\bf R_2$ be correlations (or covariances) in Set2 of $p_2$ variables. $\bf R_{12}$ be $p_1 \times p_2$ correlations (or covariances) between the sets.

Make $\bf S_1$ the diagonal matrix containing standard deviations in Set1; likewise $\bf S_2$ the diagonal matrix with standard deviations in Set2. If you don't know the variances (such as when you know only the correlations) assume that they all = 1. Then, unstandardized canonical coefficients will be equal to the standardized ones.

Doing analysis on covariance matrices is equivalent to analyzing centered variables, while doing analysis on correlation matrices is equivalent to analyzing z-standardized variables.

Find $\bf H_1$, the Cholesky root of $\bf R_1$: an upper-triangular matrix whereby $\bf{H_1'H_1=R_1}$. (Please note that in the Wikipedia they show it transposed, as "L", lower-triangular.) Likewise, find $\bf H_2$, the Cholesky root of $\bf R_2$.

Compute $\bf W$:

$\bf = {H_1'}^{-1} R_{12} {H_2}^{-1}$, if $p_1 \le p_2$; or

$\bf = {H_2'}^{-1} R_{12}' {H_1}^{-1}$, if $p_1 \gt p_2$.

Do singular-value decomposition of $\bf W$, whereby $\bf W=UDV'$.

Canonical correlations $\gamma_1, \gamma_2,...,\gamma_m$ where $m=\min(p_1,p_2)$ stand on the diagonal of $\bf D$. How to test them for significance - see here.

Compute standardized canonical coefficients $\bf K_1$ (for Set1) and $\bf K_2$ (for Set2):

$\bf K_1 = H_1^{-1}U$ and $\bf K_2 = H_2^{-1}V$ (first $p_1$ columns of $\bf K_2$), if $p_1 \le p_2$; or

$\bf K_1 = H_1^{-1}V$ (first $p_2$ columns of $\bf K_1$) and $\bf K_2 = H_2^{-1}U$, if $p_1 \gt p_2$.

Standardized coefficients correspond to the decompositions of the $\bf R$-matrices as when they were correlation matrices, even if actually the matrices were covariance. Hence "standardized" label.

Compute unstandardized canonical coefficients $\bf C_1$ (for Set1) and $\bf C_2$ (for Set2):

$\bf C_1 = S_1^{-1}K_1$ and $\bf C_2 = S_2^{-1}K_2$.

When the three input $\bf R$-matrices are correlations, not covariances, and the two $\bf S$ diagonals are comprised of ones - which corresponds to the analysis of z-standardized variables - then standardized and unstandardized coefficients are same. Some CCA programs just don't display unstandardized coefficients at all - mostly the programs which base the CCA analysis only on correlations; these programs may omit label "standardized" when they output the (standardized) coefficients.

Compute canonical loadings $\bf A_1$ (for Set1) and $\bf A_2$ (for Set2):

$\bf A_1 = S_1^{-1}(S_1R_1S_1)C_1$ and $\bf A_2 = S_2^{-1}(S_2R_2S_2)C_2$ .

Mean squares in columns of $\bf A_1$ are the proportion-of-variance in Set1 explained by its own canonical variates. Likewise, analogously in $\bf A_2$.

Compute canonical cross-loadings $\bf A_{12}$ (for Set1) and $\bf A_{21}$ (for Set2):

$\bf A_{12} = S_1^{-1}(S_1R_{12}S_2)C_2$ and $\bf A_{21} = S_2^{-1}(S_1R_{12}S_2)'C_1$ .

Mean squares in columns of $\bf A_{12}$ are the proportion-of-variance in Set1 explained by the opposite set's canonical variates. Likewise, analogously in $\bf A_{21}$.

Compute canonical variates scores (if you have casewise data at hand):

Variates extracted from Set1 $\bf Z_1=X_1K_1$ and variates extracted from Set2 $\bf Z_2=X_2K_2$, where $\bf X_1$ and $\bf X_2$ are the (centered) variables of Set1 and Set2.

The variates are produced standardized (mean = 0, st. dev. = 1). Pearson correlation between variates $Z_{1(j)}$ and $Z_{2(j)}$ is the canonical correlation $\gamma_j$. For visual explanation of the idea of canonical correlations please look in here.

Solved – Using canonical correlation analysis (CCA) to find matches

This looks like a possible approach.

CCA will find pairs of vectors $(\mathbf w, \mathbf v)$ such that projections $\mathbf X \mathbf w$ and $\mathbf Y \mathbf v$ have maximal possible correlations (the pairs will be ordered in the order of decreasing correlations). Projection vectors are normalized such that the variance of $\mathbf X \mathbf w$ and of $\mathbf Y \mathbf v$ is equal to $1$. This means that projections are not only correlated, but "on the same scale" and hence can be directly compared.

Some things to keep in mind is that: (1) you can only center your test data with the mean of the training data; (2) in high dimensions CCA is prone to overfitting and it will be a good idea either to use regularized CCA or to preprocess the data with PCA.

Here is a very simple Matlab script implementing this approach:

% // Using Fisher Iris data.
% // X will be petal measurements, Y will be sepal measurements
load fisheriris
trainN = 75; %// using half of the data for training and half for testing

centerTrain = mean(meas(1:trainN,:));
X = bsxfun(@minus, meas(:,1:2), centerTrain(1:2));
Y = bsxfun(@minus, meas(:,3:4), centerTrain(3:4));

% // This computes CCA on the training data
[A,B,r] = canoncorr(X(1:trainN,:), Y(1:trainN,:));

% // Projecting the test data
Xtestpr = X(trainN+1:end, :) * A;
Ytestpr = Y(trainN+1:end, :) * B;

% // Loop over all train samples
correct = 0;
for i=1:size(Xtestpr,1)
    % // Using only the first CCA projection, find the sample in Y
    % // closest to the one in X. Euclidean distance is used as a
    % // similarity measure.
    [~, ind] = min(sum((Xtestpr(i,1) - Ytestpr(:,1)).^2, 2));

    % // if classified correctly
    if ind==i
        correct = correct+1;
    end    
end

%// compute the probability that so many correct matchings
%// could be obtained by chance
pval = 1 - binocdf(correct-1, size(Xtestpr,1), 1/size(Xtestpr,1));

%// compute confidence interval on correct matching rate
[~, cinf] = binofit(correct, size(Xtestpr,1));

This gives me $6$ correct classifications out of $75$, which does not sound like a lot, but still is clearly significant with a p-value of $0.0005$. Confidence interval on matching probability is $(0.03, 0.17)$.

Best Answer

Related Solutions

Solved – Canonical Correlation analysis without raw data (algebra of CCA)

Solved – Using canonical correlation analysis (CCA) to find matches

Related Question