[Math] How to find the number of subsets with different sum in a set

binomial-coefficientscombinatorics

Given a set of $N$ distinct elements, it is well known that there is $2^N$ different subsets.

There is any way to accurately calculate the number of which those subsets holds exactly the same sum? i.e:

give the set {40,10,30,80,20,20} there are more than 4 subsets having the same sum:

  • 40+10+30 = 80
  • 40+20+20 = 80
  • 40+40 = 80
  • 80 = 80

I'm not interested on know the final sums, just if there is a straight way to calculate the potential count of different sum values in all subsets.

Best Answer

The example suggests you are concerned with forming sums, possibly with repeated summands, by drawing from a multiset of positive integers. In this respect you are asking about the size of the feasible set of an integer programming problem, which is (without other assumptions) NP-hard.

This is an interesting problem, one that has been discussed previously at MathOverflow and at StackOverflow (but not on Math.SE as far as I could tell).

Let me offer an argument why we should not expect a straightforward useful formula for these counts, at least not one that can be evaluated in polynomial time. Let us restrict attention for the moment to sets of $N$ distinct values (no repetitions, so a strict set and not a multiset is involved).

If there were a "simple" (polynomial time in size of inputs) way to determine how many subset-sums occur, then we would certainly know, by comparing with the maximum possible value $2^N$, whether two distinct subsets give the same sum. This is a decision problem introduced by Woeginger and Yu (1992) in "On the equal-subset-sum problem", which they showed to be NP-complete. Therefore no formula evaluable in polynomial time is possible, unless P=NP. [A more recent paper by Cieliebak et al (2008) summarizes the complexity of several variants of equal-subset-sum, and moreover it is not behind a "paywall".]

Below I sketch an algorithm more sophisticated than simple "brute force" consideration of all possible $2^N$ (multi)subsets. I will then add an example and compare my suggestion to the interesting dynamic programming approach sketched in an answer to the StackOverflow post linked above.

One way to count the possible sums is to construct them all systematically by how many summands are used (counting the allowed repetitions afforded by the multiset). That is, the empty subset gives a zero sum. The single summand sums are precisely the distinct elements of the multiset. The distinct sums do not vary with the order of the summands, so we can systematically construct the distinct sums of $n$ summmands by extracting the distinct $n$-multisubsets of the given multiset in (say) lexicographic order.

An algorithm for doing so was described in this recent Answer.

It is then necessary to redact those $n$-summand sums which agree with results obtained by the same or fewer number of summands, and merge those which are novel. We will eliminate duplicates most efficiently if we maintain the list of sums achieved in order. While (as you note) lexicographic generation does not always produce new sums in ascending order, it does tend to produce long runs of ascending order.

The "dynamic programming approach" sketched in the SO Answer alluded to above proceeds instead by recursively including the available elements from smallest to largest. We only need bits up to the sum of all elements to record which of these respective sums are achieved or not. This is a pseudo-polynomial time (and space) algorithm.

Example (to do)