MATLAB: How to speed up Matrix Vector Multipliacation

accelerationmultiplicationParallel Computing Toolboxparallelism

I found that for large matrix with matrix multiplication, Matlab will automatically parallel calculate it. ( CPU usage 100%) However, when doing Matrix multiply vector, it seems only one kernel is in use.
Can this operation be done parallelly? For example, I think we could simply divide the large matrix into many pieces according to row. I tried using parfor in Parallel Toolbox, which result in very low performance.
I also tried writing C function using CBLAS, however, as stated in, https://www.mathworks.com/matlabcentral/answers/390546-wrong-result-when-calling-cblas-dgemv-function-in-a-mex-file There are some problems in this implementation.
I wonder if there are other ways of accelerating this? Thanks!
I tried the following parfor code:
N = % num of rows
M = % num of cols
A = rand(N,M);
b = rand(M,1);
c = zeros(N,1);
p = 6;
batch = floor(N / 6);
parfor i = 1:p
istart = (i-1)*batch+1;
iend = i*batch;
if (i==p)
iend=N;
end
A_sliced = A(istart:iend,:);
c(istart:iend) = A_sliced * b;
end

Best Answer

MATLAB automatically multi-threads computations where there will be a clear gain. For a matrix-vector multiply, it apparently does not see a gain, unless the problem is large enough. Remember that parallel computations are not always a speedup, because there is extra overhead. If you can farm it to the GPU directly, you might get a gain though. Since I lack that TB, I cannot help you there.
As a test though, I checked to see if MATLAB will multi-thread a matrix*vector operation.
A = rand(15000);B = rand(15000,1);
I stopped here, waiting for a few seconds until MATLAB went quiescent. To ensure that I was not seeing multithreading happen on the call to rand.
for i = 1:100
C = A*B;
end
The multiply was so fast to do, that I had to add a loop to allow me to see my CPU usage go up to the full 400% possible. So indeed MATLAB does multi-thread a matrix*vector multiply, if it sees a gain.
Make sure you have maxNumCompThreads set to the correct value for your CPU. For me:
maxNumCompThreads
ans =
4
Hyper-threading does not count however.