MATLAB: Vectorizing nonlinear matrix operation on many small matrices

I am trying to optimize the following generic matrix operation:

m = 3; % small number in general
n = 2^20; % large power of 2 in general
A = rand(m,n);
B = zeros(m^2,m^2);
for ii = 1:size(A,2)
    a = A(:,ii);
    r = a*a';
    B = B + kron(r,r);
end
% return B

On my computer the above takes ~7s. By compiling to a MEX file with MATLAB Coder I can improve this by ~15x. I have tried compiling to CUDA with GPU Coder, but this seems to be quite inefficient.

I think the difficulty comes from two different sources:

1) I am not sure of an efficient way to vectorize the creation of the "r" matrices from the columns of the A matrix, and so have to resort to the outer for loop approach

2) I think the Kronecker product is inefficient to implement on the gpu due to the small matrix size

The speedup from compiling to MEX is nice, but I just have this feeling that I am still doing something quite inefficiently. I would appreciate if anyone has any ideas on how to optimize the above calculation, either along the lines of the two difficulties I outlined above, or via a different approach.

m = 3; % small number in general n = 2^20; % large power of 2 in general A = rand(m,n); tic; B = zeros(m^2,m^2); for ii = 1:size(A,2) a = A(:,ii); r = a*a'; B = B + kron(r,r); end toc; Elapsed time is 6.800329 seconds. tic; C=reshape(A,m,1,n).*reshape(A,1,m,n); C=reshape(C,m^2,n); B=C*C.'; toc; Elapsed time is 0.081757 seconds.

MATLAB: Vectorizing nonlinear matrix operation on many small matrices

Best Answer

Related Question

Best Answer

Related Solutions

MATLAB: Running Code on GPU Seems much Slower than Doing so on CPU

MATLAB: How can i find maximum value at each (x,y)location and the plane which contains it in a multidimensional matrix

Related Question