I found that for large matrix with matrix multiplication, Matlab will automatically parallel calculate it. ( CPU usage 100%) However, when doing Matrix multiply vector, it seems only one kernel is in use.
Can this operation be done parallelly? For example, I think we could simply divide the large matrix into many pieces according to row. I tried using parfor in Parallel Toolbox, which result in very low performance.
I also tried writing C function using CBLAS, however, as stated in, https://www.mathworks.com/matlabcentral/answers/390546-wrong-result-when-calling-cblas-dgemv-function-in-a-mex-file There are some problems in this implementation.
I wonder if there are other ways of accelerating this? Thanks!
I tried the following parfor code:
N = % num of rows
M = % num of cols
A = rand(N,M);b = rand(M,1);c = zeros(N,1);p = 6;batch = floor(N / 6);parfor i = 1:p istart = (i-1)*batch+1; iend = i*batch; if (i==p) iend=N; end A_sliced = A(istart:iend,:); c(istart:iend) = A_sliced * b;end
Best Answer