If I have a vector w (length n) and I want to pick a single random number between 1 and n using w as the weights, I can do something like
idx = randsample(n,1,true,w)
and I'll get a number between 1 and n with probability w(idx)/sum(w). Great.
Similarly, I have a matrix W (size N x M, where each of N,M is in the thousands or so), and I want to draw M random numbers between 1 and N, with the columns of W acting as independent weight vectors. I could obviously do
idx = zeros(N,1);for i = 1:Midx(i) = randsample(N,1,true,W(:,i));end
…but I'm going to be calling this literally billions of times, so I'm looking for some efficiency.
I know that an equivalent way to think of this is to take my W matrix, normalize the columns so that they sum to one, do a cumsum on the columns, select a vector of uniform random numbers using rand(1,M), and find the first row indices where they are greater than the cumsum values, but I don't know how to do that without using a loop and find():
W_normalized = bsxfun(@rdivide,W,sum(W,1));W_cdf = cumsum(W,1);x = rand(1,M);C = bsxfun(@lt,x,W_cdf);
and then the first row of each column of C with a 1 in it is my random number, but I haven't had any luck doing that in an efficient, vectorized way (I've seen this thread, but I think their conclusion to use a for-loop doesn't really seem to hold for larger matrices).
Any suggestions?
Thanks, Dan
Best Answer