MATLAB: When to use MEX in a MATLAB progam (Example of an Ulam Map)

mexspeed

I am attempting to learn how to write MEX programs to use in my MATLAB programs when there is need. As far as I know, one does this when one encounters a bottleneck that cannot be suitably vectorized – i.e. when one needs to repat a loop a large number of times, and this loop in unavoidable. However, in the program (See below) that I wrote to test this, I am not finding a significant speed up between MEX and Matlab, and am wondering if a) I have done something wrong b) the example I chose wasn't a good one, or c) MEX does not offer that substantial of a speed up. Any advice in this matter would be greatly appreciated.
This is the MEX program I wrote to iterate the Ulam Map.
#include <math.h>
#include "mex.h"
/*A program to iterate the Ulam Map in MEX
*Call from MATLAB using UlamMap(seed, iter_num */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
#define B_OUT plhs[0]
#define S_IN prhs[0]
#define I_IN prhs[1]
double *B, S;
int I, N, n;
if (nrhs < 1 || nrhs >2)
mexErrMsgTxt("Wrong number of input arguments.");
else if(nlhs > 1)
mexErrMsgTxt("Too many output arguments.");
if(nrhs == 1)
I = 4096;
else
I = mxGetScalar(I_IN);
S = mxGetScalar(S_IN);
B_OUT = mxCreateDoubleMatrix(1, I, mxREAL);
B = mxGetPr(B_OUT);
B[0] = S;
for (n = 1; n<I; n++)
{
B[n] = 1.0-2.0*pow(B[n-1], 2);
}
return;
}
This is the wrapper script which executes this program, as well as a MATLAB version.
format long
x0 = 0.1;
iter =4000096;
tic
vals = zeros(1, iter);
vals(1) = 0.1;
for i = 2:iter
vals(i) = 1-2*vals(i-1)^2;
end
toc
tic
b = UlamMap(x0, iter);
toc
sum((vals-b).^2)
With this large number of iterations used, I see almost no speed up; with a smaller number, it is around a 1.5 speed up. Is this expected? What can I do to maximize the speed up from MEX? Is this an appropriate example?
Thanks,
Danny

Best Answer

POW(X,2) is much slower than X*X, because it is a general algorithm, which has the power to calculate POW(X,2.1) also. Therefore James suggestion is much faster than the original method.
A further improvement:
B[0] = S;
for (n = 1; n<I; n++) {
S = 1.0 - 2.0 * S * S;
B[n] = S;
}
0.044 sec.