[Math] Knapsack problem NP-complete

algorithmscomputer sciencenp-complete

Show that the knapsack problem (Given a sequence of integers $S=i_1, i_2, \dots , i_n$ and an integer $k$, is there a subsequence of $S$ that sums to exactly $k$?) is NP-complete.

Hint:Use the exact cover problem.

The exact cover problem is the follwing:
Given a family of sets $S_1, S_2, \dots , S_n$ does there exist a set cover consisting of a subfamily of pairwise disjoint sets?

(Set cover: Given a family of sets $S_1, S_2, \dots , S_n$ does there exist a subfamily of $k$ sets $S_{i_1}, S_{i_2}, \dots , S_{i_k}$ such that $\cup_{j=1}^k S_{i_j}=\cup_{j=1}^n S_j$ ? )

First of all, to show that this problem is in $\mathcal{NP}$ do we have to do the following??

A nondeterministic Turing machine can first guess which the subsequence of that we are looking for is and then verify that it sums to exactly k in linear time.

Is this correct??

We know that the exact cover problem has solution iff each element of the set $X$ is in exactly one subset $S_i$, right?

To reduce this problem to the Knapsack problem do we have to map the subsets $S_j$ to the integers $i_j$ ??

Or is there an other way to reduce the exact cover problem to the Knapsack problem??

EDIT:

Can I formulate the reduction as followed??

The exact cover problem is the following:
Given a family of sets $S_1, S_2, \dots , S_n$ does there exist a set cover consisting of a subfamily of pairwise disjoint sets?

The Set cover is the following:
Given a family of sets $S_1, S_2, \dots , S_n$ does there exist a subfamily of $\lambda$ sets $S_{i_1}, S_{i_2}, \dots , S_{i_{\lambda}}$ such that $\cup_{j=1}^{\lambda} S_{i_j}=\cup_{j=1}^n S_j$ ?

We will reduce the Exact Cover problem to the Knapsack problem.

Let $A=\{a_1, a_2, \dots , a_m\}$ and $A_1, A_2, \dots , A_{\lambda}$, be the set and the subsets of an instance of the Exact Cover problem.

We will construct an instance of the Knapsack problem in polynomial time.
That means that we will construct the integers $i_1, i_2 \dots , i_n$ and the integer $k$ such that there is a subset $P \subseteq \{1, \dots , n\}$ with $\sum_{j \in P} i_j=k$ iff there is a family of $\lambda$ pairwise disjoint sets that covers the whole $A$.

Each of the $\lambda$ subsets $A_j$ corresponds to an integer with $\lambda$ digits, $e_{j1} \dots e_{j \lambda}$ in the base $\lambda+1$, where $e_{ji}=1$ if $a_i \in A_j$ and $e_{ji}=0$ otherwise.

So, we have $\lambda$ integers $i_1, \dots , i_{\lambda}$ where $i_j=\sum_{r=1}^{\lambda}e_{jr}(\lambda+1)^{r-1}$.

We set $k=\sum_{r=0}^{\lambda-1}(\lambda+1)^r$.

So, the question is whether there is a subset the terms of which sum to exactly $k$, that means to $\underbrace{111 \dots 1}_{\lambda}$.

Therefore, this instance of the Knapsack problem has a solution iff the initial instance of the Exact Cover problem has a solution.

The reduction is polytime because p(n) steps are needed to "translate" any input of the Exact Cover problem into an input of the Knapsack problem, where p(n) is some polynomial

Is it correct??

Could I improve something??

Are the indices that I used correct??

Best Answer

The Knapsack problem is NP, and any problem in NP can be reduced to an NP complete problem (Cook's Theorem). So to show that the knapsack problem is NP complete it is sufficient to show that an NP-complete problem is reducible to the Knapsack problem. We want to use the exact cover problem to show this. A lot of such reductions can be found in the paper of Karp[1], including the reduction that is described here. My notation conforms mostly to the notation in this paper.

Assume that $\{S_j,j=1,\ldots,p\}$ is a family of sets that covers the set $U=\{u_i,i=1,\ldots,t\}$, so $$\cup_{j=1}^{m} S_j=U$$ We want to check if there is a subfamily $\{S_{j_h},h=1,\ldots,q\}$ of $\{S_j,j=1,\ldots,p\}$ that covers $U$ and where the members of these family are pairwise disjoint. Let $d$ be an arbitrary positive integer. We can assign the number

$$\; \epsilon_{i,j}= \begin{cases} 0,& u_i \not \in S_j \\ 1, & u_i \in S_j \end{cases} \\ a_j=\sum_{j=1}^m \epsilon_{i,j} d^{i-1} \tag{1}$$ to a subset $S_j$.

So $a_j$ is a number that can be written only with digits $0$ and $1$ in a number system with base $d$. It has the digit $1$ on position $i-1$ if and only if $u_i \in S_j$. If $a_{j_1}$ and $a_{j_2}$ are to values assigned to two sets of an exact cover, then there is no position in the representation as an $d$ base number, where both numbers have the digit $1$. The sum $b$ of all the numbers assigned to an exact cover is $$b=\sum_{i=1}^{t}d^{i-1} \tag{2}$$

if $\{S_j,j=1,\ldots,p\}$ has an exact cover $\{S_{j_h},h=1,\ldots,q\}$ then $x=(x_1,\cdots,x_p)$ with

$$ x_j= \begin{cases} 1 & \text{if}\; j=j_h, \text{for an } h \in \{1,...,q\} \\ 0 & \text{if}\; j\ne j_h, \text{for all } h \in \{1,...,q\} \\ \end{cases}$$

is one solution of the of the 0-1-Knaspack problem

$$\sum_{i=1}^{p} a_i x_i=b$$ where $a_j$ and $b$ are defined in $(1)$ and $(2)$. If we select the base $d=p+1$ (or larger) fo all the $2^p$ 0-1-sums no carry will occurr when adding them. So when ome of them sum up to be, there summands con't have 1s on the same place.

[1] Richard M. Karp: Reducibility Among Combinatorial Problems In: R. E. Miller and J. W. Thatcher (Hrsg.): Complexity of Computer Computations. Plenum Press, New York, 1972, p. 85–103

Related Solutions

[Math] How to reduce 0-1 knapsack to knapsack-like problem with overflow

Since both are optimization problems, you are talking about the NP-hard knapsack problem, not the NP-complete one, because only decision problems (or decision version of optimization problems) can be NP-complete. See the wiki page for Knapsack problem for definitions.

The standard reduction when working with NP-hard problems is the Turing reduction, not the Many-one reduction which is used inside NPC. The key difference are:

whether you are allowed to call the oracle which solves the knapsack-like problem more than once (Turing reduction: yes, Many-one reduction: no)
when you are allowed to call the oracle (Turing reduction: not limited, Many-one reduction: only at the end of computation)
whether you can modify the results returned by the oracle before returning (Turing reduction: yes, Many-one reduction: no).

As @dtldarek said, it doesn't seem easy to find a many-one reduction between the two problems, where only transformation on the inputs are allowed, but it's possible to solve either one with multiple calls to an oracle solving the other (thus a Turing reduction).

Solving Knapsack-like using Knapsack: Note that for increasing total cost C, the maximal total value is non-decreasing. This property allows binary search:

Binary search for the total cost C (in range $[0,\sum c_i]$), and check (using the oracle to Knapsack) whether it's possible to achieve a total value $\ge b$. Choose either half according to the result. The smallest C can be found with small modification to binary search.

Solving Knapsack using Knapsack-like: Similar property holds: for increasing total value, the minimal cost (weight) is non-decreasing. So a similar algorithm is easy to find.

[Math] Largest Difference of Two Numbers in an Array

EDIT: Realized my prior algorithm was flawed. This new one should be right.

EDIT 2: And a second flaw was pointed out years later (despite its simplicity). Thanks @gfppoy

There is an $O(n)$ algorithm broken down into two cases. (Note that figuring out which of the two cases we're in boils down to figuring out whether an array is strictly decreasing, which is itself $O(n)$)

Case 1: The array is strictly decreasing. In this scenario, we simply compare pairs of adjacent numbers and return the value corresponding to the "closest" two.

Case 2: Everything else

Loop through the array, keeping track of the maximum element. The catch is, now we "reset" the maximum every time we find a new minimum element. Something like this in pseudocode:

max_difference = 0
min = s[0]
max = s[0]

for i = 1 to s.length - 1
    if s[i] < min
        min = s[i]
        max = s[i]
    else
        max = max(s[i], max)
        max_difference = max(max - min, max_difference)

That way, for each new "maximum" we find, we associate it with the smallest element to its left, and for that minimum element, we are only considering maxima to its right (and if we don't compare it with a maximum to its right, it will be because we found a smaller element that is still to the left of that maximum).

NB: It's possible to modify the pseudocode to make it cover both cases and still be $O(n)$. I've not done so since reasoning out the correctness of that code becomes unnecessarily complicated; the algorithm presented here is just easier to read and reason about.

Best Answer

Related Solutions

[Math] How to reduce 0-1 knapsack to knapsack-like problem with overflow

[Math] Largest Difference of Two Numbers in an Array

Related Question