MATLAB: Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

linear algebraMATLABmatrixmatrix arraymatrix manipulationperformancesparse

Given a matrix X and its vector form I am after the most efficient way to build the matrices L and U which extracts the lower and upper triangle from X.
So in MATLAB code it would be something like that:
clear();
numRows = 3;
numCols = numRows;
mX = randn(numRows, numCols);
vX = mX(:);
% Lower Triangle are indices 2, 3, 6
mL = [ 0, 1, 0, 0, 0, 0, 0, 0, 0 ; ...
0, 0, 1, 0, 0, 0, 0, 0, 0 ; ...
0, 0, 0, 0, 0, 1, 0, 0, 0 ];
% Upper Triangle are indices 4, 7, 8
mU = [ 0, 0, 0, 1, 0, 0, 0, 0, 0 ; ...
0, 0, 0, 0, 0, 0, 1, 0, 0 ; ...
0, 0, 0, 0, 0, 0, 0, 1, 0 ];
assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));
assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));
I am after sparse represenation of mU and mL in the most efficient way.
My current implementation is given by:
function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
case(INCLUDE_DIAGONAL)
numElements = 0.5 * numRows * (numRows + 1);
diagIdx = 0;
case(EXCLUDE_DIAGONAL)
numElements = 0.5 * (numRows - 1) * numRows;
diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
elmntIdx = 0;
for jj = 1:numRows
for ii = (jj + diagIdx):numRows
elmntIdx = elmntIdx + 1;
vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
end
end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
elmntIdx = numElements + 1;
for jj = numRows:-1:1
for ii = (jj - diagIdx):-1:1
elmntIdx = elmntIdx - 1;
vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
end
end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end
Is there a more efficient way to generate vJ without extensive allocation of memory (In order to allow generating really large matrices)?
Thank You.

Best Answer

Here is a mex routine that generates the sparse double matrices mL and mU directly, so no wasted memory in creating them. Seems to run about 3x-5x faster than m-code for somewhat large sizes.
/* S = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag)
*
* S = double sparse matrix
* numRows = integer > 0
* triangleFlag = 1 , extract lower triangle
* 2 , extract upper triangle
* diagFlag = 1 , include diagonal
* 2 , exclude diagonal
* where
*
* M = an numRows X numRows matrix of non-zero terms
* assert(isequal(S * M(:), mX(logical(tril(M, -1))))); % for lower
* assert(isequal(S * M(:), mX(logical(triu(M, 1))))); % for upper
*
* Programmer: James Tursa
* Date: 2020-April-22
*/
#include "mex.h"
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize numRows, triangleFlag, diagFlag, numElements;
mwIndex *Ir, *Jc;
mwIndex i, j, k, m;
double *pr;
if( nrhs != 3 || !mxIsNumeric(prhs[0]) || !mxIsNumeric(prhs[1]) || !mxIsNumeric(prhs[2]) ||
mxGetNumberOfElements(prhs[0]) != 1 || mxGetNumberOfElements(prhs[1]) != 1 ||
mxGetNumberOfElements(prhs[2]) != 1 ) {
mexErrMsgTxt("Need three numeric scalar inputs");
}
if( nlhs > 1 ) {
mexErrMsgTxt("Too many outputs");
}
numRows = mxGetScalar(prhs[0]);
triangleFlag = mxGetScalar(prhs[1]);
diagFlag = mxGetScalar(prhs[2]);
if( numRows < 1 ) {
mexErrMsgTxt("Invalid numRows, should be > 0");
}
if( triangleFlag != 1 && triangleFlag != 2 ) {
mexErrMsgTxt("Invalid triangleFlag, should be 1 or 2");
}
if( diagFlag != 1 && diagFlag != 2 ) {
mexErrMsgTxt("Invalid diagFlag, should be 1 or 2");
}
if( diagFlag == 1 ) {
numElements = numRows * (numRows + 1) / 2; /* include diagonal */
} else {
numElements = (numRows - 1) * numRows / 2; /* exclude diagonal */
}
plhs[0] = mxCreateSparse(numElements, numRows*numRows, numElements, mxREAL);
pr = (double *) mxGetData(plhs[0]);
Ir = mxGetIr(plhs[0]);
Jc = mxGetJc(plhs[0]);
Jc[0] = 0;
diagFlag--;
k = 0;
m = 1;
if( triangleFlag == 1 ) { /* Lower */
for( j=0; j<numRows; j++ ) {
for( i=0; i<numRows; i++ ) {
if( i >= j+diagFlag ) {
*pr++ = 1.0;
*Ir++ = k++;
Jc[m] = Jc[m-1] + 1;
} else {
Jc[m] = Jc[m-1];
}
m++;
}
}
} else { /* Upper */
for( j=0; j<numRows; j++ ) {
for( i=0; i<numRows; i++ ) {
if( i+diagFlag <= j ) {
*pr++ = 1.0;
*Ir++ = k++;
Jc[m] = Jc[m-1] + 1;
} else {
Jc[m] = Jc[m-1];
}
m++;
}
}
}
}
You mex the routine as follows (you need a supported C compiler installed):
mex GenerateTriangleExtractorMatrixMex.c
And some test code:
% GenerateTriangleExtractorMatrix_test.m
n = 300;
disp('m-code timing')
tic
GenerateTriangleExtractorMatrix(10000,1,1);
toc
disp('mex code timing')
tic
GenerateTriangleExtractorMatrixMex(10000,1,1);
toc
for k=1:n
numRows = ceil(rand*5000+100);
numCols = numRows;
triangleFlag = (rand<0.5) + 1;
diagFlag = (rand<0.5) + 1;
Mm = GenerateTriangleExtractorMatrix(numRows,triangleFlag,diagFlag);
Mx = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag);
if( ~isequal(Mm,Mx) )
error('Not equal');
end
end
disp('Random tests passed')
With a sample run:
>> GenerateTriangleExtractorMatrix_test
m-code timing
Elapsed time is 9.964882 seconds.
mex code timing
Elapsed time is 1.901741 seconds.
Random tests passed