MATLAB: Fast sum 3D matrix, Mex file

mex file

Hello,
I would like to sum all pixels values of a 3D matrix in one dimension and put the final result in a 2D matrix. For example, the input is a 3D matrix: A(1280,1280,700). And the output is a 2D array: D(:,:) = A(:,:,1)+A(:,:,2)+A(:,:,3)+…..
As you can see in the mex file (test.c), the sum of all pixels is stored in a variable: pix. But while storing this variable into the array, the calculation time become very long: >40 sec… I don't understand why the writing into an array is taking so long. Is there a way to improve the Mex function in order to compute the algorithme faster?
Thank you for your answer.
Kind regards, Fouad

Best Answer

The job should be done very efficiently by Matlab already:
pix = sum(A, 3);
The C code looks okay, but maybe it is simply a memory problem. A [1280 x 1280 x 700] array of type double needs 9.18 GB. Creating a second one might exhaust your RAM, such that the slow disk caching is used. You should see an increased disk access then.
Some hints:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
// Do not use mxGetData for a double array, because this is a job
// for mxGetPr:
double* Matrix_3D = mxGetPr(prhs[0]);
// Use mwSize and do not speculate that it equals int:
const mwSize* Dim3Dmatrix = mxGetDimensions (prhs[0]);
mwSize i, j, n;
double* mat2Dout, pix;
plhs[0] = mxCreateDoubleMatrix (Dim3Dmatrix[0], Dim3Dmatrix[1], mxREAL);
// No need to cast the output of mxGetPr to double *, because it
// is one already:
mat2Dout = mxGetPr(plhs[0]);
// Use one linear index for the 1st and 2nd dimension.
// Access neighboring elements of input and output to use the
// processor cache efficiently:
n = Dim3Dmatrix[0] * Dim3Dmatrix[1];
for(j = 0; j < Dim3Dmatrix[2]; j++) {
for(i = 0; i < n; i++) {
mat2Dout[i] += *Matrix_3D++;
}
}
}
Accessing the elements of the input in large steps is not efficient, because the CPU can read a cacheline (64 byte usually) at once. Therefore the modified method is faster: For a (500, 500, 700) it needs 0.7 sec instead of 5.4 for the original version. sum is multi-threaded in addition and needs 0.55 sec, by the way. (Measured under Matlab R2016b, Core2Duo).
Your array is 6.5 time larger and the original code needs 8 times more run time. This does not sound like disk caching. So maybe it is a CPU cache problem only, or you use an even slower processor than I do.