MATLAB: Looking for something like a matrix version of randsample… [vectorization!]

If I have a vector w (length n) and I want to pick a single random number between 1 and n using w as the weights, I can do something like

idx = randsample(n,1,true,w)

and I'll get a number between 1 and n with probability w(idx)/sum(w). Great.

Similarly, I have a matrix W (size N x M, where each of N,M is in the thousands or so), and I want to draw M random numbers between 1 and N, with the columns of W acting as independent weight vectors. I could obviously do

idx = zeros(N,1);
for i = 1:M
idx(i) = randsample(N,1,true,W(:,i));
end

…but I'm going to be calling this literally billions of times, so I'm looking for some efficiency.

I know that an equivalent way to think of this is to take my W matrix, normalize the columns so that they sum to one, do a cumsum on the columns, select a vector of uniform random numbers using rand(1,M), and find the first row indices where they are greater than the cumsum values, but I don't know how to do that without using a loop and find():

W_normalized = bsxfun(@rdivide,W,sum(W,1));
W_cdf = cumsum(W,1);
x = rand(1,M);
C = bsxfun(@lt,x,W_cdf);

and then the first row of each column of C with a 1 in it is my random number, but I haven't had any luck doing that in an efficient, vectorized way (I've seen this thread, but I think their conclusion to use a for-loop doesn't really seem to hold for larger matrices).

Any suggestions?

Thanks, Dan

Best Answer

Here's a pure matrix version of your bsxfun calls. It should be internally parallelized if your matrices are large.

W_normalized = W./repmat(sum(W,1),size(W,1),1);
W_cdf = cumsum(W_normalized,1);
x = rand(1,size(W,2));
C = repmat(x,size(W,1),1)<W_cdf;

You still need to find the first "true" value in C for each column to form an index.

Below I make a logical array where only the first "true" is present in each column:

idx = [C(1,:); xor(C(2:end,:),C(1:end-1,:))];

Hope this helps.

Best Answer

Related Solutions

MATLAB: Row-normalizing large sparse matrix

MATLAB: I want to take different values in each loop

Related Question