MATLAB: Computational Complexity of matrix multiplication

complex matrix multiplicationMATLAB

Please suggest which of follwing will have higher computaional complexity.

A*xCxC*xA
A*xRxR*xA

Where, A = MxN complex vakued matrix, C =NxN complex vakued matrix and R=NxN real valued matrix.

x stands for matrix multiplication.

* stands for Hermitian transpose.

regards

jayant

Best Answer

This question is perhaps more involved than it looks on the surface, because by default MATLAB doesn't store the imaginary part of a real variable and MATLAB can call BLAS symmetric matrix multiply routines in the background to do the job in some cases. And the BLAS library is highly optimized for multi-threading and cache usage. So depending on the complexity of the inputs and your version of MATLAB, there may be data copies involved that you didn't realize. And depending on the order of operations, MATLAB may or may not be able to call those BLAS symmetric matrix multiply routines which run in about 1/2 the time of the generic matrix multiply routines. E.g., for R2018a or later (interleaved complex memory model):

D = A' * C * C' * A

will be done as ((A' * C) * C') * A. Because of this order, MATLAB will not recognize the symmetry and will not make use of the BLAS symmetric matrix multiply routines. There are three generic matrix multiplies involved. Because of floating point effects, the result may not be strictly Hermitian either.

D = (A' * C) * (C' * A)

will be done as three generic matrix multiplies. Because of this order, MATLAB will likely not recognize that A' * C is the Hermitian transpose of C' * A, so you just get the generic matrix multiplies in the background. Because of floating point effects, the result may not be strictly Hermitian either.

ATC = A' * C

D = ATC * ATC'

will be done as a generic matrix multiply followed by a symmetric matrix multiply. MATLAB is able to recognize the symmetry of the second multiply in this case. This would be the best performance, and the result is guaranteed to be strictly Hermitian.

D = A' * R * R' * A

will be done as ((A' * R) * R') * A. Because of this order, MATLAB will not recognize the symmetry and will not make use of the BLAS symmetric matrix multiply routines. There are three generic matrix multiplies involved. On top of that, the R matrix will first have to be copied into interleaved complex format (imaginary part 0) before the complex matrix multiply routines can be called. So you get that additional burden, twice in fact, but this may be somewhat offset by the speed of doing a lot of 0 multiplies. Because of floating point effects, the result may not be strictly Hermitian either.

D = (A' * R) * (R' * A)

will be done as three generic matrix multiplies. Because of this order, MATLAB will likely not recognize that A' * R is the Hermitian transpose of R' * A, so you just get the generic matrix multiplies in the background. On top of that, the R matrix will first have to be copied into interleaved complex format (imaginary part 0) before the complex matrix multiply routines can be called. So you get that additional burden, twice in fact, but this may be somewhat offset by the speed of doing a lot of 0 multiplies. Because of floating point effects, the result may not be strictly Hermitian either.

ATR = A' * R

D = ATR * ATR'

will be done as a generic matrix multiply followed by a symmetric matrix multiply. MATLAB is able to recognize the symmetry of the second multiply in this case. On top of that, the R matrix will first have to be copied into interleaved complex format (imaginary part 0) before the complex matrix multiply routines can be called. So you get that additional burden, but only once, and this may be somewhat offset by the speed of doing a lot of 0 multiplies. This would be the best performance, and the result is guaranteed to be strictly Hermitian.

Comparing using C vs using R, you are basically trading doing generic complex multipies vs doing a data copy and then 1/2 of the operations are 0 multiplies. The R may or may not be faster because of this, but could easily be slower and will likely involve a temporary memory bump.

R2017b and earlier (separate complex memory model):

In these versions MATLAB stores the real and imaginary parts separately, so you do not get that extra data copy of R prior to the complex matrix multiplies, and you avoid all of the resulting 0 multiplies. The complex*real matrix multiply is actually done in pieces, and overall operation takes about 1/2 the operations of a complex*complex matrix multiply. This is a big difference between what happens in these cases between R2017b and earlier vs R2018a and later.

E.g., for R2018a or later interleaved memory model:

>> version
ans =
    '9.8.0.1323502 (R2020a)'
>> A = rand(5000,4000)+rand(5000,4000)*i;
>> C = rand(5000,5000)+rand(5000,5000)*i;
>> R = rand(5000,5000);
>> tic;ATC=(A' * C); DS = ATC * ATC';toc
Elapsed time is 6.282376 seconds.
>> tic;ATR=(A' * R); DS = ATR * ATR';toc
Elapsed time is 6.475804 seconds.

And for R2017b or earlier separate memory model:

>> version
ans =
9.0.0.370719 (R2016a)
>> A = rand(5000,4000)+rand(5000,4000)*i;
>> C = rand(5000,5000)+rand(5000,5000)*i;
>> R = rand(5000,5000);
>> tic;ATC=(A' * C); DS = ATC * ATC';toc
Elapsed time is 7.422857 seconds.
>> tic;ATR=(A' * R); DS = ATR * ATR';toc
Elapsed time is 4.870730 seconds.

There is no advantage using R in the interleaved complex memory model (R2020a in this case) ... in fact I got a slight decrease in performance due to the temporary data copy. But using R in the separate complex memory model (R2016a in this case) has a clear advantage by avoiding the temporary data copy and a lot of those 0 multiplies.

Related Solutions

MATLAB: R2018b real times complex multiplication

Probably related to the fact that complex variables changed to an interleaved storage format in R2018a. So my guess is this is what is happening:

R2017b: Only two calls to the BLAS real matrix*vector multiply routine are needed, the results being put into the real & imaginary parts of the output. I.e., one 5000x5000 * 5000x1 multiply for the real*real part and another 5000x5000 * 5000x1 multiply for the real*imag part. Note that no data copying is needed prior to these calls, and the results can be used directly in the output variable without any additional copying needed.

R2018b: The real 5000x5000 matrix is probably first deep copied into an interleaved complex version, then one call to the BLAS complex matrix*vector multiply routine is used to produce the output. So, the extra work that this does over the R2017b version is the allocation for the temporary 5000x5000x2 matrix elements, the deep data copy of the 5000x5000 real matrix into the real part of that interleaved complex matrix, and then twice as much multiplying (5000x5000x2 elements * 5000x1x2 elements multiply).

My guess is the memory allocation and deep data copy is probably the biggest time killer, but I would have to do some tests to verify this. The actual multiply should only be about twice as slow.

So, if you know that downstream your 5000x5000 real matrix will be used to multiply by complex variables many times, it would be best to make it complex first up front so you only incur the temp allocation and deep data copy once.

This points out one advantage of the old separate storage format: You could mix and match real and complex variables in matrix multiply operations and accomplish all of it with multiple real BLAS matrix multiply calls without any data copying needed either before or after the operations. This is not true anymore as of R2018a.

EDIT 12/21/2018

Here are the results of some breakout timing tests

>> version
ans =
    '9.3.0.713579 (R2017b)'
>> computer
ans =
    'PCWIN64'
>> X = rand(7000,7000); y = rand(7000,1)*1i;
>> tic;X*y;toc
Elapsed time is 0.019165 seconds.

and

>> version
ans =
    '9.4.0.813654 (R2018a)'
>> computer
ans =
    'PCWIN64'
>> X = rand(7000,7000); y = rand(7000,1)*1i;
>> tic;X*y;toc
Elapsed time is 0.626945 seconds.
>> tic;C=complex(X);toc
Elapsed time is 0.579250 seconds.
>> tic;C*y;toc
Elapsed time is 0.023560 seconds.

So, as suspected, the bulk of the extra timing appears to be in the conversion of the large real matrix to complex interleaved format. The actual matrix multiply isn't that much more than the R2017b version.

MATLAB: Multiply then sum elements of two matrices

It is not clear from your question whether corresponding elements are to be multiplied or if you are wanting to do a matrix multiplication.

If corresponding elements are to be multiplied, then you could calculate the sum-of-products using

dot(A(:),B(:))

Best Answer

Related Solutions

MATLAB: R2018b real times complex multiplication

MATLAB: Multiply then sum elements of two matrices

Related Question