MATLAB: Numerical packing

groupingoptimization

Hi all, here's a quick one. I've got an array and a limit:
A = [235 235 235 234 235 235 234 234 234 220 280 215 234]
lim = 1000
I want to group the elements of A so that:
1. The sum of each group's elements is below my lim
2. The group numbers of consecutive elements of A never decrease
3. A minimum number of groups is used. Ie, if a valid solution exists using 3 groups, all solutions using 4 or more groups are inferior.
4. The maximum of the sums within each group is minimized (clarified below)
As an example:
grps = zeros(size(A));
while any(grps==0)
openIdxs = find(grps==0);
mask = cumsum(A(openIdxs))<lim;
grps(openIdxs(mask)) = max(grps)+1;
end
The above returns:
grps =
1 1 1 1 2 2 2 2 3 3 3 3 4
This satisfies (1), (2) and (3), but note that the 4th group has only one element, and the sums of each group are:
grpSums = arrayfun(@(i)sum(A(grps==i)),1:max(grps))
grpSums =
939 938 949 234
Can anyone think of a nice way to "balance" the group sums so that the maximum group sum is minimised? Note that I want to keep the group count minimised (at 4 in this case)… I don't want to just make lots of groups of 1 element.
Bonus votes for simplicity of code over speed and/or exactness of solution… I don't mind if something is close to optimal but not quite there, and I'm only working with tiny arrays similar to the example above.
Clarification:
Point (4) refers to the grpSums variable calculated above. The command max(grpSums) in this example returns 949. If move some elements from group 3 to group 4:
grps = [1 1 1 1 2 2 2 2 3 3 4 4 4]
my grpSums becomes [939 938 734 449], making max(grpSums) give 939. An even better (and perhaps optimal for this case?) solution is:
grps = [1 1 1 2 2 2 3 3 3 3 4 4 4]
which gives grpSums = [705 704 922 729]. Note that it's not so much the range of numbers in grpSums that I care about – it's specifically its maximum.

Best Answer

Thank you Sean, Elige, for your responses, and thanks Walter for naming the problem I'd posed. Even after seeing it as a "knapsack" problem, I had difficulty adopting one of the FEX knapsack solutions to this particular problem out of the box, so I also made a solution of my own (below).
Here are some codes, times, and comments:
My solution: (sorry about the line wrapping... I use a wider MATLAB page size than this website)
function grps = orderedMinOfMaxKnapsackSums(A, lim)
% Discover how many knapsacks (or groups) to use
grps = zeros(size(A));
while any(grps==0)
openIdxs = find(grps==0);
mask = cumsum(A(openIdxs))<lim;
grps(openIdxs(mask)) = max(grps)+1;
end
numGrps = max(grps);
% Build all possible indice pairs for each group
maxInds = arrayfun(@(i)find(grps==i,1,'last'),1:numGrps);
grpIndPairs = arrayfun(@(mn,mx)[mn mn; nchoosek(mn:mx,2); mx mx], 1:numGrps, maxInds, 'Un',0);
% Ensure that the first group "starts" at 1, and the last group "ends" at the length of A
grpIndPairs{1}(grpIndPairs{1}(:,1)>1,:) = [];
grpIndPairs{end}(grpIndPairs{end}(:,2)<length(A),:) = [];
% Remove the "possible" indice pairs that over-fill that knapsack
grpIndPairs = cellfun(@(pair)pair(arrayfun(@(fr,to)sum(A(fr:to)), pair(:,1),pair(:,2))<lim, :), grpIndPairs, 'Un',0);
% Reduce the "possible" indice pairs to those with valid adjacent knapsack possibilities. For
% example, if knapsack 1 only holds indices 1:2 or 1:3, knapsack 2 must start at element 3 or 4.
while 1
oldNumels = cellfun(@numel, grpIndPairs);
grpIndPairs(1:end-1) = cellfun(@(this,next)this(ismember(this(:,2), next(:,1)-1),:),...
grpIndPairs(1:end-1), grpIndPairs(2:end), 'Un',0);
grpIndPairs(2:end) = cellfun(@(this,last)this(ismember(this(:,1), last(:,2)+1),:) ,...
grpIndPairs(2:end), grpIndPairs(1:end-1), 'Un',0);
if isequal(oldNumels, cellfun(@numel, grpIndPairs)), break; end
end
% Build the set of all possible combinations from each of the groups
elNumsCell = cellfun(@(x)1:size(x,1), grpIndPairs, 'UniformOutput', false);
grpIndsCell = cell(size(elNumsCell));
[grpIndsCell{:}] = ndgrid(elNumsCell{:});
allCombs = cell2mat(arrayfun(@(i)grpIndPairs{i}(grpIndsCell{i}(:),:),1:numGrps,'Un',0));
% Remove combinations where knapsack elements don't immediately follow the previous knapsack
combsDiff = diff(allCombs,[],2);
allCombs = allCombs(all(combsDiff(:,2:2:end)==1, 2), :);
% Calculate the size of each knapsack for all the combinations that remain
combSums = zeros(size(allCombs,1),numGrps);
for i = 1:numGrps
combSums(:,i) = arrayfun(@(fr,to)sum(A(fr:to)), allCombs(:,2*(i-1)+1), allCombs(:,2*(i-1)+2));
end
% Choose the best combination
[~, bestCombIdx] = min(max(combSums,[],2));
grps = cell2mat(arrayfun(@(i)i*ones(1,1+diff(allCombs(bestCombIdx,2*(i-1)+(1:2)))), 1:numGrps,'Un',0));
end
And here's Elige's solution:
function best_combo = bruteForce(A, lim)
intmax = prod(A);
% WEIGHT
% >0.5 weight STD of sum more heavily
% <0.5 weight difference of lim and MEAN of sum more heavily
weight = 0.5;
best_combo = zeros(1,length(A));
best_fit = intmax;
for i = 1 : length(A)-1
B = nchoosek(1:length(A)-1,i);
for j = 1 : nchoosek(length(A)-1,i)
C = (i+1)*ones(1,length(A));
for k = sort(1 : i,'descend')
C(1:length(A)<=B(j,k))=k;
end
fit = (std(arrayfun(@(k)sum(A(C==k)),1:i+1))*(weight))^2 + ...
((lim-mean(arrayfun(@(k)sum(A(C==k)),1:i+1)))*(1-weight))^2;
if sum(arrayfun(@(k)sum(A(C==k)),1:i+1)>lim) == 0 && fit < best_fit
best_fit = fit;
best_combo = C;
end
end
end
end
Here are some timings:
A = [235 235 235 234 235 235 234 234 234 220 280 215 234]
lim = 1000
tic, for i = 1:10
orderedMinOfMaxKnapsackSums(A, lim);
end, toc
tic, for i = 1:10
bruteForce(A, lim);
end, toc
Elapsed time is 0.053719 seconds.
Elapsed time is 338.265262 seconds.
Elige, I know I stated that I was dealing with small data sets so efficiency wasn't an issue - but I think the brute force method here ran for a bit longer than I was anticipating.
Sean, I don't have the GO toolbox so I can't test out your solution. Can you post a little speed summary here for us?
Walter, I had a bit of trouble fitting my particular problem into one of the FEX knapsack entries. It seems that they had in-built criterion to optimise (whereas my criterion was to minimise the maximum knapsack size), so I went ahead making my own solution as an exercise. Did I miss something obvious that could have saved time?
So anyway Sean, you get the accepted answer... I'm tempted to accept my own but it doesn't meet my own criteria (of short and sweet) and want to at least give you guys the credit for tackling someone else's problem :)