MATLAB: Advantages of mwsize (Cross-Platform Flexibility)

c codeMATLAB

Hello,

in the documentation there is the hint to use mwsize rather than int for Cross-Platform Flexibility. What is the actual advantage? The backround of my question is the following: I want to write (and partly already have written) C-Code which I wan to use in MATLAB. I tried to find a good workflow to extend the existing C-Code and use it in MATLAB. The big problem I came across is the debugging. I found this very helpful Blog entry: https://blogs.mathworks.com/developer/2018/06/19/mex-debugging-vscode/ nevertheless, it is pretty cumbersome to debug like this all the time. Especially because MATLAB might crash pretty often because bugs happen during development of new C-Code. Therefore, I thought it would be better to write the code in a C IDE and just put the wrapper mexFunction around my code at the end. My question is, if I loose a lot of performance if i do it this way?

Furthermore, a question arose regarding the 'MATLAB Support for Interleaved Complex API in MEX Functions'. Which API is preferred for high speed applications?

Best Answer

"Furthermore, a question arose regarding the 'MATLAB Support for Interleaved Complex API in MEX Functions'. Which API is preferred for high speed applications?"

This is somewhat of a moot point given that the choice is determined by the MATLAB version you are using. If you are using R2017b or earlier, then you will be using two separate Real/Imaginary data areas. If you are using R2018a or later, then you will be using a single interleaved Complex data area. There are no MATLAB versions that simultaneously support both methods. Compiling a mex routine in R2018a or later with the -R2017b memory model option simply forces the mex routine to do a copy-in/copy-out on all complex variables in the background. It does not change the underlying storage scheme of the variable data, which is always interleaved complex in R2018a and later.

As to which is faster, that depends on what you are doing. Note that the BLAS and LAPACK complex linear algebra library routines that MATLAB uses only support the interleaved complex data model (in every version of MATLAB, not just R2018a and later), so that drives the comments below. E.g.,

Matrix Multiply real * complex:

The R2017b separate storage scheme will be faster because the BLAS matrix multiply routines can be called directly without any intermediate data copying needed. I.e., the individual real*real and real*imaginary pieces can be done by making two calls to the real BLAS matrix multiply routine and the results stuffed directly into the MATLAB output variable. For the R2018a interleaved storage scheme to use the complex BLAS matrix multiply routine in this case, it must first deep copy the real variable into a complex variable with imaginary part 0, and then make the call. So extra wasted memory and time to do the intermediate deep data copy for R2018a and lots of extra unnecessary 0 multiplies.

Linear Algebra calls to complex LAPACK routines:

The R2018a interleaved storage scheme will be faster because the input can be passed directly to the LAPACK routine and the output stuffed directly into a MATLAB variable. No intermediate deep data copying needed. For the R2017b separate storage scheme to use the complex LAPACK routine, it must first deep copy the separate real/imaginary data areas into a single contiguous interleaved area and then pass that to the LAPACK routine. Then it must take the interleaved result and deep copy it into two separate real/imaginary data areas for output back to MATLAB. So extra wasted memory and time to do the intermediate deep data copies.

Related Solutions

MATLAB: Problem using BLAS routine ztbsv.c

All of the BLAS and LAPACK complex storage is based on an original Fortran standard, which has real and imaginary parts interleaved. The C structure doublecomplex is intended to mimic this storage scheme, and the blas.h header file should as well. Early versions of MATLAB did not have correct headers for the complex arguments of BLAS/LAPACK functions. I think that has been corrected but have not actually checked lately.

Your example doesn't make sense, and I assume it is a typo. This:

1+2i
2+3i
4+6i

Should be stored in memory as:

 {1,2,2,3,4,6}

Which is not what you have written.

I think you will need to post your complex code so we can look at it. Do you have both Ar and br as doublecomplex storage memory? I would also look at the blas.h and lapack.h header files for your version of MATLAB to see if they are as expected (possibly could be incorrect as noted above). There is always the chance that the library function itself has bugs in it, but I wouldn't assume that without checking everything else first. You could also download a C source code for ztbsv and call that routine (i.e., bypass the library function) to see if things work properly.

MATLAB: R2018b real times complex multiplication

Probably related to the fact that complex variables changed to an interleaved storage format in R2018a. So my guess is this is what is happening:

R2017b: Only two calls to the BLAS real matrix*vector multiply routine are needed, the results being put into the real & imaginary parts of the output. I.e., one 5000x5000 * 5000x1 multiply for the real*real part and another 5000x5000 * 5000x1 multiply for the real*imag part. Note that no data copying is needed prior to these calls, and the results can be used directly in the output variable without any additional copying needed.

R2018b: The real 5000x5000 matrix is probably first deep copied into an interleaved complex version, then one call to the BLAS complex matrix*vector multiply routine is used to produce the output. So, the extra work that this does over the R2017b version is the allocation for the temporary 5000x5000x2 matrix elements, the deep data copy of the 5000x5000 real matrix into the real part of that interleaved complex matrix, and then twice as much multiplying (5000x5000x2 elements * 5000x1x2 elements multiply).

My guess is the memory allocation and deep data copy is probably the biggest time killer, but I would have to do some tests to verify this. The actual multiply should only be about twice as slow.

So, if you know that downstream your 5000x5000 real matrix will be used to multiply by complex variables many times, it would be best to make it complex first up front so you only incur the temp allocation and deep data copy once.

This points out one advantage of the old separate storage format: You could mix and match real and complex variables in matrix multiply operations and accomplish all of it with multiple real BLAS matrix multiply calls without any data copying needed either before or after the operations. This is not true anymore as of R2018a.

EDIT 12/21/2018

Here are the results of some breakout timing tests

>> version
ans =
    '9.3.0.713579 (R2017b)'
>> computer
ans =
    'PCWIN64'
>> X = rand(7000,7000); y = rand(7000,1)*1i;
>> tic;X*y;toc
Elapsed time is 0.019165 seconds.

and

>> version
ans =
    '9.4.0.813654 (R2018a)'
>> computer
ans =
    'PCWIN64'
>> X = rand(7000,7000); y = rand(7000,1)*1i;
>> tic;X*y;toc
Elapsed time is 0.626945 seconds.
>> tic;C=complex(X);toc
Elapsed time is 0.579250 seconds.
>> tic;C*y;toc
Elapsed time is 0.023560 seconds.

So, as suspected, the bulk of the extra timing appears to be in the conversion of the large real matrix to complex interleaved format. The actual matrix multiply isn't that much more than the R2017b version.

Best Answer

Related Solutions

MATLAB: Problem using BLAS routine ztbsv.c

MATLAB: R2018b real times complex multiplication

Related Question