I'm trying to do X'SX N times and store the results in a matrix that is N long.
Remembering this is an inner product so the output is 1xN. X is (dx1) and S is (dxd), Where d is on the order of 10 and N is large.
I currently have the for loop below:
X = (X-mu);N = length(X)z = zeros(length(X),1);for i = 1:Nz(i) = X(:,i)'*S*X(:,i);end
I want to run the loop faster(more parallel) as N can be upwards of 1,000,000. Optimally I would use gpuArrays, but I ran into the problem that I don't know how to work on the rows element-wise and the columns matrix wise. Any help would be great.
Best Answer