I would like to sum all pixels values of a 3D matrix in one dimension and put the final result in a 2D matrix. For example, the input is a 3D matrix: A(1280,1280,700). And the output is a 2D array: D(:,:) = A(:,:,1)+A(:,:,2)+A(:,:,3)+…..
As you can see in the mex file (test.c), the sum of all pixels is stored in a variable: pix. But while storing this variable into the array, the calculation time become very long: >40 sec… I don't understand why the writing into an array is taking so long. Is there a way to improve the Mex function in order to compute the algorithme faster?
The job should be done very efficiently by Matlab already:
pix = sum(A, 3);
The C code looks okay, but maybe it is simply a memory problem. A [1280 x 1280 x 700] array of type double needs 9.18 GB. Creating a second one might exhaust your RAM, such that the slow disk caching is used. You should see an increased disk access then.
Some hints:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
// Do not use mxGetData for a double array, because this is a job
// for mxGetPr:
double* Matrix_3D = mxGetPr(prhs[0]);
// Use mwSize and do not speculate that it equals int:
const mwSize* Dim3Dmatrix = mxGetDimensions (prhs[0]);
mwSize i, j, n;
double* mat2Dout, pix;
plhs[0] = mxCreateDoubleMatrix (Dim3Dmatrix[0], Dim3Dmatrix[1], mxREAL);
// No need to cast the output of mxGetPr to double *, because it
// is one already:
mat2Dout = mxGetPr(plhs[0]);
// Use one linear index for the 1st and 2nd dimension.
// Access neighboring elements of input and output to use the
// processor cache efficiently:
n = Dim3Dmatrix[0] * Dim3Dmatrix[1];
for(j = 0; j < Dim3Dmatrix[2]; j++) {
for(i = 0; i < n; i++) {
mat2Dout[i] += *Matrix_3D++;
Accessing the elements of the input in large steps is not efficient, because the CPU can read a cacheline (64 byte usually) at once. Therefore the modified method is faster: For a (500, 500, 700) it needs 0.7 sec instead of 5.4 for the original version. sum is multi-threaded in addition and needs 0.55 sec, by the way. (Measured under Matlab R2016b, Core2Duo).
Your array is 6.5 time larger and the original code needs 8 times more run time. This does not sound like disk caching. So maybe it is a CPU cache problem only, or you use an even slower processor than I do.