This phenomenon extends beyond the traveling salesman problem, and even beyond NP, for there are even some undecidable problems with the feature that most instances can be solved very quickly.
There is an emerging subfield of complexity theory called generic-case complexity, which is concerned with decision problems in the generic case, the problem of solving most or nearly all instances of a given problem. This contrasts with the situtation in the classical complexity theory, where it is in effect the worst-case complexity that drives many complexity classifications. (And even for approximate solutions in NP-hard problems, the worst-case phenomenon is still present.)
Particularly interesting is the black-hole phenomenon, the phenomenon by which the difficulty of an infeasible or even undecidable problem is concentrated in a very tiny region, outside of which it is easy. (Here, tiny means tiny with respect to some natural measure, such as asymptotic density.) For example, many of the classical decision problems from combinatorial group theory, such as the word problem and the conjugacy problem are linear time solvable in the generic case. This phenomenon provides a negative answer to analogue of your question 1 for these problems. The fact that the problems are easily solved outside the black hole provides a negative answer to the analogue of question 2. And I think that the fact that these problems are actually undecidable as total problems suggests that this manner of solving almost all cases of a problem will not help us with P vs. NP, in your question 3.
For question 4, let me mention that an extreme version of the black-hole phenomenon is provided even by the classical halting problem. Of course, this is the most famous of undecidable problems. Nevertheless, Alexei Miasnikov and I proved that for one of the standard Turing machine models with a one-way infinite tape, there is an algorithm that solves the halting problem on a set of asymptotic measure one. That is, there is a set A of Turing machine programs, such that (1) almost every program is in A, in the sense of asymptotic density, (2) A is linear time decidable, and (3) the halting problem is linear time decidable for programs in A. This result appears in (J. D. Hamkins and A. Miasnikov, The halting problem is decidable on a set of asymptotic probability one, Notre Dame J. Formal Logic 47, 2006. http://arxiv.org/abs/math/0504351). Inside the black hole, the complement of A, of course, the problem is intractible. The proof, unfortunately, does not fully generalize to all the other implementations of Turing machines, since for other models one finds a black hole of some measure intermediate between 0 and 1, rather than measure 0.
Every r.e. language is polynomial-time reducible to the halting problem. Since there are computable languages (indeed, in $\Delta^E_3$) having the maximum possible circuit complexity for every length $n$ (which is asymptotically $2^n/n$), the halting problem also has exponential circuit complexity. The exact complexity will depend on the particular representation of algorithms in the definition of the halting problem, and specifically, on the complexity of the reduction function which hardwires an input string into a fixed algorithm. In the most obvious representations, this function blows up the input length only linearly and can be made computable by linear-size circuits, hence we get $2^{\Theta(n)}$ as the circuit complexity of the halting problem. If the halting problem is formulated directly for algorithms accepting an input, the reduction function increases the input length by an additive constant and has essentially constant complexity, so the circuit complexity of the halting problem is $\Theta(2^n/n)$ in such a formulation.
Best Answer
Not 100% in-field but I'll try an answer.
As the Wikipedia page says, the problem is intrinsically difficult and slow to solve. Ill-conditioning of the roots and of the qualitative results (dimension, number and multiplicity of zeros...) is also often an issue. As a result, people do not use often high-degree polynomial systems in modelling; linear methods can be made to work better in practice. This reduces a lot the demand.
They are hard, but that is not the main issue. Lots of problems such as solving PDEs or simulating protein folding are hard, too, but they seem to have much more "market".
There is some market, for instance applications in algebraic geometry, but as far as I know they are mostly within mathematics.
I am not extremely familiar with all these packages, but for what I have used them they seem to work reasonably well. The main issue is the intrinsic difficulty of the problem: exponential problem is exponential. It's difficult to do a good job when there are $2^{20}$ solutions, and they are ill-conditioned.
Do you mean "would they publish well" or "would I be able to sell them to industries"? I think they would publish well if they work; for industry, I think not immediately.