Combinatorics – Why is the Traveling Salesperson Problem Difficult?

combinatoricsdiscrete mathematicsgraph theoryoptimization

The Traveling Salesperson Problem is originally a mathematics/computer science optimization problem in which the goal is to determine a path to take between a group of cities such that you return to the starting city after visiting each city exactly once and the total distance (longitude/latitude) traveled is minimized. For $n$ cities, there are $(n-1)!/2$ unique paths – and we can see that as $n$ increases, the number of paths to consider becomes enormous in size. For even a small number of cities (e.g. 15 cities), modern computers are unable to solve this problem using "brute force" (i.e. calculate all possible routes and return the shortest route) – as a result, sophisticated optimization algorithms and approximate methods are used to tackle this problem in real life.

I was trying to explain this problem to my friend, and I couldn't think of an example which shows why the Travelling Salesperson Problem is difficult! Off the top of my head, I tried to give an example where someone is required to find the shortest route between Boston, Chicago and Los Angeles – but then I realized that the shortest path in this case is pretty obvious! (i.e. Move in the general East to West direction).

Real world applications of the Travelling Salesperson Problem tend to have an additional layer of complexity as they generally have a "cost" associated between pairs of cities – and this cost doesn't have to be symmetric. For example, buses might be scheduled more frequently to go from a small city to a big city, but scheduled less frequently to return from the big city to the small city – thus, we might be able to associate a "cost" with each direction. Or even a simpler example, you might have to drive "uphill" to go from City A to City B, but drive "downhill" to go from City B to City A – thus there is likely a greater cost to go from City A to City B. Many times, these "costs" are not fully known and have to be approximated with some statistical model. However, all this can become a bit complicated to explain to someone who isn't familiar with all these terms.

But I am still looking for an example to explain to my friend – can someone please help me think of an obvious and simple example of the Travelling Salesperson Problem where it becomes evidently clear that the choice of the shortest path is not obvious? Every simple example I try to think of tends to be very obvious (e.g. Manhattan, Newark, Nashville) – I don't want to overwhelm my friend with an example of 1000 cities across the USA : just something simple with 4-5 cities in which it is not immediately clear (and perhaps even counterintuitive) which path should be taken?

I tried to show an example using the R programming language in which there are 10 (random) points on a grid – starting from the lowest point, the path taken involves choosing the nearest point from each current point:

library(ggplot2)

set.seed(123)

x_cor = rnorm(5,100,100)
y_cor = rnorm(5,100,100)


my_data = data.frame(x_cor,y_cor)

      x_cor     y_cor
1  43.95244 271.50650
2  76.98225 146.09162
3 255.87083 -26.50612
4 107.05084  31.31471
5 112.92877  55.43380


ggplot(my_data, aes(x=x_cor, y=y_cor)) + geom_point() + ggtitle("Travelling Salesperson Example")

enter image description here

But even in this example, the shortest path looks "obvious" (imagine you are required to start this problem from the bottom most right point):

enter image description here

I tried with more points:

set.seed(123)

x_cor = rnorm(20,100,100)
y_cor = rnorm(20,100,100)


my_data = data.frame(x_cor,y_cor)

ggplot(my_data, aes(x = x_cor, y = y_cor)) +
    geom_path() +
    geom_point(size = 2)

enter image description here

But my friend still argues that the "find the nearest point from the current point and repeat" (imagine you are required to start this problem from the bottom most right point):

enter image description here

How do I convince my friend that what he is doing corresponds to a "Greedy Search" that is only returning a "local minimum" and it's very likely that a shorter path exists? (not even the "shortest path" – just a "shorter path" than the "Greedy Search")

I tried to illustrate this example by linking him to the Wikipedia Page on Greedy Search that shows why Greedy Search can often miss the true minimum : https://en.wikipedia.org/wiki/Greedy_algorithm#/media/File:Greedy-search-path-example.gif

  • Could someone help me think of an example to show my friend in which choosing the immediate nearest point from where you are, does not result in the total shortest path? (e.g. some example that appears counterintuitive, i.e. if you choose a path always based on the nearest point from your current position, you can clearly see that this is not the optimal path)

  • Is there a mathematical proof that shows that the "Greedy Search" algorithm in Travelling Salesperson has the possibility of sometimes missing the true optimal path?

Thanks!

Best Answer

Here's a simple explicit example in which the greedy algorithm always fails, this arrangement of cities (and euclidean distances):

(0,1), (10, 0), (0,-1), (-10,0)

If you apply the greedy algorithm on this graph, it'll look like the following (or a flipped version):

This is true regardless of the starting point. This means the greedy algorithm gives us a path with a total distance traveled of $20 + 2 + 2\sqrt{101} \approx 42.1$

Clearly, this isn't the optimal solution though. Just by eyeballing it, you can see that this is the best path:

It has a total length of $4\sqrt{101} \approx 40.2$, which is better than the greedy algorithm.

You can explain to your friend that the reason why the greedy algorithm fails is because it doesn't look ahead. It sees the shortest path (in this case, the vertical one), and takes it. However, doing so may later force it to take a much long path, leaving it worse off in the long run. While it's simple to see in this example, detecting every case where this happens is a lot harder.