MATLAB: How to generate combinatorial vectors avoiding some determinated subvectors

combinatorics

Hello, I have to generate in a script many combinatorial vectors, and the matter is the time: the script would take many months, probably some years to finish. The last step in the script is to generate all the (60¦9)=60!/(9!*54!) combinatorial vectors of lenght 9 (one at a time, using a vector called a and overwriting the previous vector), where the components are ordered, different and assume integer values between 1 and 60, that are
(1,2,3,4,5,6,7,8,9)
(1,2,3,4,5,6,7,8,10)
(1,2,3,4,5,6,7,8,11)
... ...
(1,2,3,4,5,6,7,8,60)
(1,2,3,4,5,6,7,9,10)
(1,2,3,4,5,6,7,9,11)
... ...
(52,53,54,55,56,57,58,59,60).
For doing this I used the sequence of commands
a = int8(1:9);
finish = 1;
while finish == 1
i = 9;
while i>0 & a(i) == 51+i
i = i-1;
end
if i == 0
finish = 0;
else
a(i) = a(i)+1;
while i<9
a(i+1) = a(i)+1;
i = i+1;
end
end
end
The problem is that the script must test each one of these vectors and delete those which contain some combinatorial vectors of lenght between 2 and 8: for example, if the script must test if the vector
a = [a(1) a(2) a(3) a(4) a(5) a(6) a(7) a(8) a(9)]
doesn't contain nor the vector of lenght 3 [4 19 47] neither the vector of lenght 5 [3 10 16 40 50], I used the command
sum(ismember([4 19 47],nchoosek(a,3))'rows') +
sum(ismember([3 10 16 40 50],nchoosek(a,5))'rows') == 0 .
Generating all these vectors, and, ABOVE ALL, using many times the command "sum(ismember(…))" causes a huge loss of time.
My question is if it is possible, rather than generate all the vectors and then exclude (using "sum(ismember(..))") those which contain some subvectors found before (the script begins with vectors of lenght 2, increasing by 1 the lenght of the vectors when it has finished with all the vectors of lenght 2, and so on, arriving to vectors of lenght 9), TO GENERATE DIRECTLY ONLY THE VECTORS WHICH DO NOT CONTAIN THE GIVEN SUBVECTORS.
P. S. Someone has suggested me to write a script with Fortran90, or C++, and to parallelize it. This could be already a solution, but I'd like to know if there can be a solution ALSO BY IMPROVING THE ALGORITHM.
Thanks to those who will read this text until the end, every kind of answer will useful to me.
Lorenzo Bussoli

Best Answer

As far as improving the efficiency of your code, I have some questions about what you are doing.
First, I do not see how your loop is generating what you say it is. When I make it display 'a' just before the last END, the first vector is:
1 2 3 4 5 6 7 8 10
and not:
1 2 3 4 5 6 7 8 9
Second, I assume that it is right before the last END in your loop that you are using a for whatever other purpose you have, and this is where you do your check. Is that correct?
Third, I don't understand your check. In one place you say that you want to check if 'a' has any "combinatorial vectors" but later it looks like you are looking for an exact match. For example, say
a = [2 19 4 47 3]
and we want to check if 'a' contains any combinatorial of the vector
s = [4 19 47];
meaning that if a has any of these subvectors:
[4 19 47; 4 47 19; 19 4 47; 19 47 4; 47 4 19; 47 19 4]
then we will get a positive return. Your statement does not do this
TF = ismember(s,nchoosek(a,3),'rows'); % TF==0
But then later you seem to be saying that you only want to see if 'a' contains the exact vector 's', not a permutation. If this is the case, why make it so hard (and slow) using ISMEMBER and NCHOOSEK? It is much faster to do:
a = [2 4 19 47 3]; % New vector which contains vect 's'
TF = ismember(s,nchoosek(a,3),'rows'); % TF==1
TF2 = any(strfind(a,s)); %TF2==1
Once things are a little clearer, we might be able to help you speed this up even more.