[Math] Uniform correlation matrix sampling and not so uniform laws

pr.probabilityprobability distributionsrandom matrices

Hi everyone,

I am looking for a way of simulating correlation matrices of fixed dimension in (at least) two ways.

First, I would like to determine the "uniform" distribution over the "correlation matrices space" and being able to sample from it (even defining what "uniform" means is not such an easy task !!!).

Second, I would like to get some parametric law for which sampled correlation matrices would be such that :

-the expectations of the eigenvalues are equal to some fixed set of values.

-we have a certain amount of "control" over the second moments of those eigenvalues through the parameters set.

I hope I made myself clear enough, and thank's for any leads or references to acheive such a thing. From my point of view this is a surprisingly complex problem and I thought it might raise some interest among you.

So far I tried an approach using diagonalization through orthogonal matrices and eigenvalues which was not really a success…

Edit :

Here are a few facts (sometimes empirical) that have come up when trying a diagonalization approach by an orthogonal matrix and a set of eigenvalues.(i.e. $C=\Omega Diag(\lambda_1,…,\lambda_n)\Omega^T$ with (I hope) obvious notations).

Naively, you can think that it would suffice to sample uniformly and independently a set of eigenvalue in the n-simplex and uniformly sample from orthogonal matrices space, after what you "do" the mutliplication to get a correlation matrix that is uniformly sampled in the correlation matrices sapce. But…

If your are given a set of eigenvalues (for your correlation matrix), then you can't sample (uniformly) an orthogonal matrix and get a correlation matrix after multiplication.

There are $n(n-3)/2$ degrees of freedom among the parameters of the orthogonal matrix to get a correlation matrix, when the space O(n) is a $n(n-1)/2$ dimensional space (the diagonal of the correlation matrix are constrained to be equal to 1 and that makes n equations). The "submanifolds" of O(n) that is so defined (to be compatible orthogonal matrices with respect to a set of eigenvalues) is really hard to express in a way that could be usefull.

Also, a fact that has (empirically) appeared is that sampling uniformly from orthognonal matrix space, then I thnik taht almost surely the eigenvalues that are compatible to get a correlation matrix are simply (1,…,1). So the correlation matrix is simply the identity matrix $I_n$ and this is I think the real surpising fact to me (not too surprisng though if you consider the degrees of freedom in the problem)

Regards

Best Answer

This article [Harry Joe, Generating random correlation matrices based on partial correlations, Journal of Multivariate Analysis, 97 (10), pp. 2177-2189 (2006)] addresses the question. The author gives a simple construction for generating a correlation matrix that is uniform over the space of positive definite correlation matrices. And there is an associated R command.

Related Question