If you don't have the original casewise data but know the correlations (and hopefully the variances and the sample size) you may simply generate random data having those correlations and analyze that dataset as usual by the canonical correlations program that take in raw data. This way, every output will be correct except the computation of canonical variates' values - for this would need the true data you don't have.
But anyway, if you want to program canonical correlation analysis (CCA) youself, here is a step-by-step algorithm for you. You may use any language having basic linear algebra matrix functions.
Let $\bf R_1$ be correlations (or covariances) in Set1 of $p_1$ variables. $\bf R_2$ be correlations (or covariances) in Set2 of $p_2$ variables. $\bf R_{12}$ be $p_1 \times p_2$ correlations (or covariances) between the sets.
Make $\bf S_1$ the diagonal matrix containing standard deviations in Set1; likewise $\bf S_2$ the diagonal matrix with standard deviations in Set2. If you don't know the variances (such as when you know only the correlations) assume that they all = 1. Then, unstandardized canonical coefficients will be equal to the standardized ones.
Doing analysis on covariance matrices is equivalent to analyzing centered variables, while doing analysis on correlation matrices is equivalent to analyzing z-standardized variables.
Find $\bf H_1$, the Cholesky root of $\bf R_1$: an upper-triangular matrix whereby $\bf{H_1'H_1=R_1}$. (Please note that in the Wikipedia they show it transposed, as "L", lower-triangular.) Likewise, find $\bf H_2$, the Cholesky root of $\bf R_2$.
Compute $\bf W$:
$\bf = {H_1'}^{-1} R_{12} {H_2}^{-1}$, if $p_1 \le p_2$; or
$\bf = {H_2'}^{-1} R_{12}' {H_1}^{-1}$, if $p_1 \gt p_2$.
Do singular-value decomposition of $\bf W$, whereby $\bf W=UDV'$.
Canonical correlations $\gamma_1, \gamma_2,...,\gamma_m$ where $m=\min(p_1,p_2)$ stand on the diagonal of $\bf D$. How to test them for significance - see here.
Compute standardized canonical coefficients $\bf K_1$ (for Set1) and $\bf K_2$ (for Set2):
$\bf K_1 = H_1^{-1}U$ and $\bf K_2 = H_2^{-1}V$ (first $p_1$ columns of $\bf K_2$), if $p_1 \le p_2$; or
$\bf K_1 = H_1^{-1}V$ (first $p_2$ columns of $\bf K_1$) and $\bf K_2 = H_2^{-1}U$, if $p_1 \gt p_2$.
Standardized coefficients correspond to the decompositions of the $\bf R$-matrices as when they were correlation matrices, even if actually the matrices were covariance. Hence "standardized" label.
Compute unstandardized canonical coefficients $\bf C_1$ (for Set1) and $\bf C_2$ (for Set2):
$\bf C_1 = S_1^{-1}K_1$ and $\bf C_2 = S_2^{-1}K_2$.
When the three input $\bf R$-matrices are correlations, not covariances, and the two $\bf S$ diagonals are comprised of ones - which corresponds to the analysis of z-standardized variables - then standardized and unstandardized coefficients are same. Some CCA programs just don't display unstandardized coefficients at all - mostly the programs which base the CCA analysis only on correlations; these programs may omit label "standardized" when they output the (standardized) coefficients.
Compute canonical loadings $\bf A_1$ (for Set1) and $\bf A_2$ (for Set2):
$\bf A_1 = S_1^{-1}(S_1R_1S_1)C_1$ and $\bf A_2 = S_2^{-1}(S_2R_2S_2)C_2$ .
Mean squares in columns of $\bf A_1$ are the proportion-of-variance in Set1 explained by its own canonical variates. Likewise, analogously in $\bf A_2$.
Compute canonical cross-loadings $\bf A_{12}$ (for Set1) and $\bf A_{21}$ (for Set2):
$\bf A_{12} = S_1^{-1}(S_1R_{12}S_2)C_2$ and $\bf A_{21} = S_2^{-1}(S_1R_{12}S_2)'C_1$ .
Mean squares in columns of $\bf A_{12}$ are the proportion-of-variance in Set1 explained by the opposite set's canonical variates. Likewise, analogously in $\bf A_{21}$.
Compute canonical variates scores (if you have casewise data at hand):
Variates extracted from Set1 $\bf Z_1=X_1K_1$ and variates extracted from Set2 $\bf Z_2=X_2K_2$, where $\bf X_1$ and $\bf X_2$ are the (centered) variables of Set1 and Set2.
The variates are produced standardized (mean = 0, st. dev. = 1). Pearson correlation between variates $Z_{1(j)}$ and $Z_{2(j)}$ is the canonical correlation $\gamma_j$. For visual explanation of the idea of canonical correlations please look in here.
If $X$ is $n\times p$ and $Y$ is $n\times q$, then one can formulate the CCA optimization problem for the first canonical pair as follows:
$$\text{Maximize }\operatorname{corr}(Xa, Yb).$$
The value of the correlation does not depend on the lengths of $a$ and $b$, so they can be arbitrarily fixed. It is convenient to fix them such that the projections have unit variances:
$$\text{Maximize }\operatorname{corr}(Xa, Yb) \text{ subject to } a^\top \Sigma_X a=1 \text{ and } b^\top \Sigma_Yb=1,$$
because then the correlation equals the covariance:
$$\text{Maximize } a^\top \Sigma_{XY}b \text{ subject to } a^\top \Sigma_X a=1 \text{ and } b^\top \Sigma_Yb=1,$$
where $\Sigma_{XY}$ is the cross-covariance matrix given by $X^\top Y/n$.
We can now generalize it to more than one dimension as follows:
$$\text{Maximize }\operatorname{tr}(A^\top \Sigma_{XY}B) \text{ subject to } A^\top \Sigma_X A=I \text{ and } B^\top \Sigma_Y B=I,$$
where the trace forms precisely the sum over successive canonical correlation coefficients, as you hypothesized in your question. You only had the constraints on $A$ and $B$ wrong.
The standard way to solve CCA problem is to define substitutions $\tilde A = \Sigma_X^{1/2} A$ and $\tilde B = \Sigma_Y^{1/2} B$ (conceptually this is equivalent to wightening both $X$ and $Y$), obtaining
$$\text{Maximize }\operatorname{tr}(\tilde A^\top \Sigma_X^{-1/2} \Sigma_{XY}\Sigma_Y^{-1/2} \tilde B) \text{ subject to } \tilde A^\top \tilde A=I \text{ and } \tilde B^\top \tilde B=I.$$
This is now easy to solve because of the orthogonality constraints; the solution is given by left and right singular vectors of $\Sigma_X^{-1/2} \Sigma_{XY}\Sigma_Y^{-1/2}$ (that can then easily be back-transformed to $A$ and $B$ without tildes).
Relationship to reduced-rank regression
CCA can be formulated as a reduced-rank regression problem. Namely, $A$ and $B$ corresponding to the first $k$ canonical pairs minimize the following cost function:
$$\Big\|(Y-XAB^\top)\Sigma_Y^{-1/2}\Big\|^2 =
\Big\|Y\Sigma_Y^{-1/2}-XAB^\top\Sigma_Y^{-1/2}\Big\|^2.$$
See e.g. Torre, 2009, Least Squares Framework for Component Analysis, page 6 (but the text is quite dense and might be a bit hard to follow). This is called reduced-rank regression because the matrix of regression coefficients $AB^\top\Sigma_Y^{-1/2}$ is of low rank $k$.
In contrast, standard OLS regression minimizes
$$\|Y-XV\|^2$$
without any rank constraint on $V$. The solution $V_\mathrm{OLS}$ will generally be full rank, i.e. rank $\min(p,q)$.
Even in the $k=p=q$ situation there still remains one crucial difference: for CCA one needs to whiten dependent variables $Y$ by replacing it with $Y\Sigma_Y^{-1/2}$. This is because regression tries to explain as much variance in $Y$ as possible, whereas CCA does not care about the variance at all, it only cares about correlation. If $Y$ is whitened, then its variance in all directions is the same, and the regression loss function starts maximizing the correlation.
(I think there is no way to obtain $A$ and $B$ from $V_\mathrm{OLS}$.)
Best Answer
This looks like a possible approach.
CCA will find pairs of vectors $(\mathbf w, \mathbf v)$ such that projections $\mathbf X \mathbf w$ and $\mathbf Y \mathbf v$ have maximal possible correlations (the pairs will be ordered in the order of decreasing correlations). Projection vectors are normalized such that the variance of $\mathbf X \mathbf w$ and of $\mathbf Y \mathbf v$ is equal to $1$. This means that projections are not only correlated, but "on the same scale" and hence can be directly compared.
Some things to keep in mind is that: (1) you can only center your test data with the mean of the training data; (2) in high dimensions CCA is prone to overfitting and it will be a good idea either to use regularized CCA or to preprocess the data with PCA.
Here is a very simple Matlab script implementing this approach:
This gives me $6$ correct classifications out of $75$, which does not sound like a lot, but still is clearly significant with a p-value of $0.0005$. Confidence interval on matching probability is $(0.03, 0.17)$.