Using MATLAB R2012b, I have a (large) sorted vector of elements. Starting from the initial value, I need to discard all elements that are within some particular range of any previous undiscarded element. Simply looking for any set with the given minimum separation will not give the unique solution I require.
(Slow, but obvious) MWE:
function vec2 = testfun(vec,threshold) idx = 0.1; vec2 = vec; while not(isempty(idx)); testdiff = diff(vec2); idx = find(testdiff<threshold,1); vec2(idx+1) = []; endend
When I call this on
vec = [1,13,27,30,33,39,53,54,100];threshold = 25;
I should only retain:
vec2 == [1,27,53,100]
but calling
testfun(vec,10)
should give me
vec2 = [1, 13, 27, 39, 53, 100]
When I call it with a script to randomly generate larger vectors:
numval = 1e5;threshold = 25;vec = sort(rand(numval,1).*numval*1e2);vec2 = testfun(vec,threshold);
the profiler indicates the following breakdown within the function:
vec2(idx+1) = []; 66%
dx = find(vecdiff<thresho... 20%vecdiff = diff(vec2); 12%
Clearly, the re-indexing and allocation of vec2 after deleting elements is slow. Yet, if I don't take those elements out (e.g., I could set them = 0 instead of []), I have to find a way to ignore negative and zero elements in the diff output, but continue to search for results below the threshold value from the previous retained element. The total runtime in the function appears to behave quadratically, although the idx = find() line begins to take up more of the total time as the length of the vector increases (517s, 46/40/13% for numval = 4e5).
Best Answer