I have a correlation matrix $A$, which I obtained using the Pearson's linear correlation coefficient through Matlab's corrcoef(). The correlation matrix of dimension 100×100, i.e. I computed the correlation matrix on 100 random variables.
Among these 100 random variables, I would like to find the 10 random variables whose correlation matrix contains as "little correlation" as possible (see Quantifying how much "more correlation" a correlation matrix A contains compared to a correlation matrix B regarding metrics to measure the overall correlation in a correlation matrix). I only care about pairwise correlation.
Are there good methods to find those 10 random variables in a reasonable amount of time (e.g. I don't want to try $\binom{100}{10}$ combinations)? Approximation algorithms are OK.
Best Answer
Let us consider the sum of absolute pairwise correlations as our measure of choice. We thus seek a vector $v\in\{0,1\}^N$ with $l_1(v)=n$ which will minimize $v'Qv$ where $Q_{ij}=|A_{ij}|$.
Assume Q is also positive definite as A, the problem is reduced to solving the constrained quadratic optimization problem:
$$v^*=\min\ v'Qv\ s.t.\ l_1(v)=n,\ v_i\in\{0,1\}$$
This suggest the folloing relaxation:
$$v^*=\min\ v'Qv\ s.t.\ l_1(v)=n,\ v_i\in[0,1]$$
which can be easily solved using off-the-shelf solvers; then the result is given by the largest $n$ components in $v^*$.
Sample matlab code: