[Math] Why is TSP in NP

computational complexitygraph theory

This question is about computational complexity and the classification of traveling salesmen problem ($TSP$) as being in $NP$. Maybe I did not fully understand the definition of the complexity class $NP$, or I did not fully understand the $TSP$. I kindly ask you to help me to understand.

In a nutshell

Part 1 of my question:
Is this definition of the complexity class $NP$ correct?

If a problem can be solved in exponential time, but can not be solved in polynomial time, and if you can prove in polynomial time, that a potential solution, presented by an oracle, is really a solution of the problem, then it is said that this problem is in $NP$.

Part 2:
There is a complete graph with $n$ nodes and $n*(n-1)$ weighted vertices connecting each node with each other in one direction each. And there is an oracle, that tells you that a certain closed path through this graph is the "shortest" path (i.e. it says that no other path was a lower summed weight of vertices). How can you prove in polynomial time that the oracle is right?


My question in detail:

This is what i know:

P

The class $P$ contains problems, that can be solved by a deterministic turing machine in polynominal time. This per definition includes, that a given solution also can be verified in polynominal time (in worst case by solving the problem, but in some cases even faster).

Sorting a list is such a problem that belongs to $P$: The size of the problem is measured by the number of items to be sorted, and the fastest algorithm I know to solve this problem needs a time that grows with the size of the problem like this formula:

time to solve: $T(n) = O(n *log (n)) < O(n^2)$

The time grows loglinear which is faster than quadratic which is polynomial.
If you get a sorted list, you can check if it is sorted in this time:

time to prove: $T(n) = O(n)$

(Just run trough the list once and compare each member with its next neighbor.) This is linear time, which is faster than the time to solve the problem. So "sorting a list" is in the class $P$.

Multiplication of two integers is also in $P$. You measure the size of the problem by the numbers of digits of both integers. The number of digits is (for big numbers) proportional to the logarithm of the number itself. So If you multiply the numbers $A$ and $B$, then the numbers of their digits are: $a=log(A)$ and $b=log(B)$. With this definitions you have:

time to solve: $T(a,b) \le O(a*b)$

If you call $n$ the average size of $a$ and $b$, you get:

time to solve: $T(n) \le O(n^2)$

If someone tells you the solution of the problem (i.e. the product of A and B), then you can check if this is true by calculating A times B and compare your product with the given solution. This means: you can prove a given solution by solving the problem, so the time to prove is the time to solve:

time to prove: $T(n) \le O(n^2)$

Since the time to solve is polynomial, the problem "multiplication of two integers" is in $P$.

NP

I did understand, that a problem is in $NP$ if it can be solved in exponential time, but not in polynomial time, and if, when an oracle gives you a solution for the problem, you can check if this given solution is really a solution in polynomial time. The official definition of $NP$ talks about a nondeterministic turing machine, but I think that my definition is equivalent to this. (Please tell me if I'm wrong)

factorization of integers is thought to be in $NP$. The fastest known algorithm to find the factors of a big integer is the "general number field sieve". If n is the number of digits (i.e. the logarithm of the number itself), then the time to solve the problem is about

time to solve: $T(n) = O(e^{\sqrt{n}*log(n)})$

This is exponential time, i.e. slower than polynomial time, and therefore "integer factorization" is outside of $P$ but inside of $EXP$ and therefore a candidate for $NP$.

If an oracle tells you the factors, you can check in polynomial time if the oracle tells the truth. You just need to multiply the factors and compare the result of the multiplication with the integer that needs to be factorized. And as shown above, "multiplication of integers" can be done in polynomial time:

time to prove: $T(n) \le O(n^2)$

So we have:

time to solve = exponential, not polynomial
time to prove = polynomial

Both together makes "factorization of integers" a member of $NP$. (As far as I did understand, please correct me if I'm wrong.)

TSP

A brute force algorithm to solve the traveling salesman problem for $n$ nodes needs a

time to solve: $T(n) = O(n!)$

but dynamic programming does it in

time to solve: $T(n) = O(c^n)$ (with $c>1$)

And there is no faster algorithm known. So $TSP$ is in $EXP$ but not in $P$, and therefore $TSP$ is a candidate for $NP$. There just needs to be an algorithm that can prove that a given solution is true in polynomial time.

My problem: I don't know such an algorithm. If you have $n$ nodes, and a $n*n$-Matrix with the weights of the vertices, and if an oracle tells you "the shortes tour is <a certain sequence of nodes>", how can you find out, that there is no other tour that is shorter?

I was searching for such an algorithm, but all publications I found are only talking about ways to solve TSP. I din't find a word about proving a solution given by an oracle. And I for myself also did not find an other way than comparing the given path with any possible path (in a worst case scenario). But like in multiplication of two integers this means: proving is solving. But solving can not be done in polynomial time, and so we have (as far as I did understand):

time to solve = exponential, not polynomial
time to prove = exponential, not polynomial

But this does not match the definition of $NP$ as far as I did understand it. But it is common sense, that $TSP$ is in $NP$ (it's even said to be the most prominent member of $NP-hard$ which is a subset of $NP$)

So obviously I misunderstood something, but I don't know what. Please tell we what I did get wrong, and please explain how it is correct.

Best Answer

  1. Your definition of the complexity class NP is not quite correct. In fact, P is a subset of NP -- any problem that lies in P, also lies in NP. The formal definition is the set of problems solvable by a non-deterministic Turing machine in time (number of steps) bounded by a polynomial in the size of the input. In fact, this already implies that it can be solved only an ordinary Turing machine in exponential time, so that part is redundant. The informal formulation of this, is that if you have an oracle which presents you with a 'potential solution', then you can check it in polynomial time: namely, the oracle says which of the possibilities of the non-deterministic Turing machine is the "right" one.

  2. Your definition of the TSP problem is also not quite correct (although people are often imprecise about this). In the common formal formulation of classes like P and NP, they contain only decision problems: that is, programs which should output true or false. Then the travelling sales person problem is presented as follows:

    Given a complete weighted graph of $n$ vertices, and a number $k$, does there exist a path through all vertices which has total weight at most $k$?

In this formulation, you can see that it is easy to verify a potential solution: just add up the weights. Also note that it is easy to find the length of a shortest path if you can efficiently solve the above problem by doing a binary search to determine the optimal number $k$.

Since, like with most NP problems that we do not expect to be in P, we do not know of any better way of getting the answer than "trying all possibilities", this problem often feels equivalent to actually producing a shortest path, and it gets talked about as if it is. But technically, it is not.

Related Question