OK there is a lot here to unpack.
First off, what do we mean by a particle in a QFT?
Well the formal definition is something like, "particle states fall into representations of the Poincaire group."
In perhaps more physical terms, a particle, in QFT, is defined by a momentum eigenstate (which might also have associated spin). You actually run into seriously trouble if you try to take the position eigenstates of a particle too seriously [briefly: if you try to localize a particle to a very small region within its own Compton wavelength, the uncertainty principle allows for particle creation, and so you get a large number of particle and antiparticle pairs being created if you try to localize the particle and the calculation blows up in your face], and so we never really think about the position of a relativistic quantum particle.
To define a particle we usually look at it when its momentum is zero, in which case a particle is defined by its mass (energy at 0 momentum) and its spin (angular momentum at zero momentum). String theory recreates particles in this sense--there are states in string theory that have mass and spin, that therefore will look like particles to a low energy observer. Furthermore, we can compute scattering of strings, which will look like scattering of particles to low energy observers who can't see that the string is stringy.
A theory of a free string (ie a worldsheet with a trivial topology) does not reproduce the full multiparticle Fock space of a quantum field theory. Thus, we can't construct a full quantum field operator $\phi(x)$ from a single free string. Instead, a theory of a single free string reproduces the single particle states of an infinite number of quantum field theories.
In other words, when you quantize the string (on a worldsheet with trivial topology), you end up with a theory of a single string. A single string looks like a single particle to a low energy observer (one who can't probe the 'stringiness' of the string). The key is that there are many different single particle states that a string might be in, characterized by mass and spin. The string has some internal structure--the low energy observer can't directly probe this structure, and so only sees a point particle--but the internal structure shows up as different masses or spins.
Let's say it one more time for good measure. A quantum field theory is a theory of an indefinite number of identical particles. When you quantize a string, you get a theory of a single string that can look like any one of an infinite tower of particles, with different masses and spins.
OK in math here is what I mean:
You can think of the string as a quantum theory, with a position operator $X^\mu$ and a momentum operator $p^\mu$. Actually its more convenient to work with the fourier transform of the position operator, we call the coefficeints $\alpha$. The Hamiltonian is given by [this is for the open string in light cone gauge, being lazy about some factors of $\alpha'$ and $2\pi$]
\begin{equation}
H = \frac{1}{2p^+} \left( \sum_{i=1}^{D-2}\frac{1}{2} p^i p^i + \sum_{n\neq 0} \alpha_n^i \alpha_{-n}^i\right)
\end{equation}
The first term, summing over $p^i p^i$, is essentially just the hamiltonian of a free particle, and is therefore not super interesting.
Perhaps you will notice that the second term, with the $\alpha_n$, is very similar to a sum of harmonic oscillator hamiltonians written in terms of the creation/annhilation operators. In fact that relationship can be made very precise, look at those notes I linked to for details.
The spectrum of the hamiltonian is thus made of a vacuum state with a bunch of equally spaced harmonic oscillator excited states living on top of it. Actually this isn't quite correct--there is actually an infinite set of vacuum states labeled by $p^\mu$
\begin{equation}
|0;p^\mu \rangle
\end{equation}
These are the eigenstates of the first term in the hamiltonian. The label $p^\mu$ is just the "center of mass motion" of the string. We can safely ignore it for the purposes of this question, and similarly we can ignore the first term in the hamiltonian above.
The interesting thing are the excited states. Just like a harmonic oscillator, you get the excited states by acting with the creation and annhilation operators. Well, we could construct proper creation and annhilation operators, but in this context it is more convenient to use the level operators, which differ from the by a normalization factor and its important to keep track of the $n$ label, however the level operators are morally very similar to creation/annhilation operators ($n<0$ correspond to creation operators). Thus the excited states are things like
\begin{equation}
\alpha_{-1}^i |0\rangle, \ \ \alpha_{-2}^i|0\rangle, \ \ \alpha_{-1}^i \alpha_{-1}^i |0\rangle, \ \ \alpha_{-1}^i \alpha_{-10}^i \alpha_{-32}^i |0\rangle
\end{equation}
Now this looks an awful lot like a Fock space from quantum field theory, where the creation/annhilation operators are creating/annhilating particles. So it looks like, by analogy, when I act with two level operators I will create two particles.
BUT THAT IS NOT WHAT IS GOING ON!!!!!!!
We only have one string. We are not creating particles, we are not creating strings. We are creating excited states of a single string.
How do we determine what this state looks like to a low energy observer? Well we compute the mass (ie the energy, assuming the center of mass motion is zero), and we compute the spin (ie the angular momentum, assuming the orbital angular momentum aka center of mass motion is zero).
How do we compute the energy and the angular momentum?
Well actually we have the position $X^\mu$ and momentum $p^\mu$ operators in the theory already, so we just use the normal definitions: $p^0$ is the energy of the string and you can use $X$ and $p$ to construct an angular momentum operator. Acting on the excited states above, you find that the states differ by mass and spin.
That's the outline, obviously I am skimming over many details, but honestly going through all of it slowly would take at least a full lecture in a string theory course, so I would recommend reading Tong's notes (link below) if you want more detail.
Anyway, to very very briefly address your other ponits:
How do we get QFT out? Well in this language it looks like honest way to do it would be to have a theory of an indefinite number of identical strings. This is known as string field theory, and is very hard. However, the modern perspective is that the strings are something of a red herring and are not the real fundamental degrees of freedom in the theory, so maybe string field theory is not the way to go in general.
Wait, but we can get QFT out, right? OK, yes it is true we can reproduce results of QFT from string theory, otherwise what would be the point? We can compute scattering of strings (it turns out what looks like two strings scattering can really be thought of as one string worldsheet with a tear in it). So we get an S matrix for strings, we can take a low energy limit of that and get an S matrix for particles. We can then write down a QFT lagrangian that reproduces the same low energy S matrix. By using methods along these lines we find that the low energy results of string theory are reproduced by writing down supergravity actions.
What does $X^\mu (\sigma,\tau)|0 \rangle$ mean? First I disagree with you that $\sigma$ and $\tau$ are "completely fake things", they are coordinates on the worldsheet. The worldsheet of the string is real (or at least, it is real if the string is real :)), so $\sigma,\tau$ are labelling real things. Also I disagree with you calling $\phi(x) |0\rangle$ a position wave function in QFT. I would call it an insertion of the operator $\phi$ at the point $x$. You are probing the quantum field at the point $x$. Similar, by acting with the operator $X^\mu(\sigma,\tau)$, you are probing the string by poking it at a point on the strings worldsheet labeled by $\sigma,\tau$.
Reference:
A very good set of lecture notes by David Tong can be found here.
See chapter 2 especially (he is doing the closed string but it's morally similar for what you are interested in).
Best Answer
Positivity of energy
The probability that a single superstring has a negative value of energy is strictly zero, as implied by supersymmetry. There's no contradiction with the commutation relation. It's easy to see why. Just define the wave function of the superstring in the $P^0$ representation. Then you have $\tilde\psi(P^0)$ and you may assume that $$\tilde\psi(P^0) = 0\mbox{ for } P^0\leq 0.$$ The operator $X^0$ may be defined as $X^0=-i \partial/\partial P^0$ and it's manifest that $$[X^0,P^0]=-i.$$ So your favorite anonymous co-father of quantum mechanics, whether it was Heisenberg, Dirac, or anyone else, had to use another assumption that is not satisfied in string theory to argue that it is not possible. (The assumption has to be violated not only in string theory but any particle-like description of a theory similar quantum field theory.)
It's pretty obvious that you don't quite know what the hypothetical no-go statement is and why it is true, so it may be a sensible idea to forget about this hypothetical statement and discard it as a vague irrational prejudice if you're reading Polchinski's book that is supposed to make everything crystal clear, and I think that it does so. You knowledge of physics can't be built on some vague rumors that you may have heard somewhere. You must actually understand why statements are true.
My guess is that she or more likely he (the founder of QM) assumed that the Fourier transform of $\tilde \psi(P^0)$, namely the function $\psi(X^0)$, also has to vanish for negative $X^0$. Those two conditions - vanishing of both the functions and its Fourier transform for negative values of the argument - are not compatible conditions. However, there is no condition that $\psi(X^0)$ vanishes for a negative value of $X^0$ for a single superstring (or any single particle, for that matter), so the contradiction disappears and the very construction above proves that there can't be any inconsistency.
String theory is usually treated in the "first-quantized" way, so we obtain one-particle and multi-particle states "directly" and "directly" calculate their scattering amplitudes. That was the treatment I was assuming above; the negative-energy one-particle states are strictly absent.
However, string theory may also be treated in the "string field theory" formalism that makes it look like a quantum field theory in the spacetime, with infinitely many fields corresponding to the excitations of the string. Then the string fields may be Fourier expanded into components with various values of $P^0$, and the operators in front of the waves with a positive/negative value of $P^0$ are labeled as annihilation/creation operators, just like in any quantum field theory. Only the creation operators may produce nonzero particle states out of the vacuum - which is why particles always carry non-negative or positive values of $P^0$.
Normalization of eigenstates of momentum
The second question is purely about the normalization of some vectors and their existence. There is no real subtlety here. The off-shell states $|0;k'\rangle$ where $k$ doesn't have to satisfy that $k_\mu k^\mu=m^2$ may be assumed to exist for any $k$. However, the physical - Virasoro - conditions force the $k$ to be on-shell.
So, in mathematical language, if you assume a particular type of a particle and its wave function is written as $$|\psi \rangle = \int d^{26} k\,\tilde\psi(k) |0;k'\rangle,$$ then the Virasoro condition guarantees that $\tilde\psi(k)=0$ for all $k_\mu$ such that $k_\mu k^\mu\neq m^2$. To obtain states of a finite norm that satisfy the on-shell condition, you must pick $$\tilde\psi(k) = \sqrt{\delta(k_\mu k^\mu - m^2) } \tilde\psi_{reduced} (\vec k)$$ where the reduced wave function only depends on the $25$ spatial components of the momentum. I had to add the square root of the delta function because the total norm of the states above will contain the $\delta$-function that, when integrated over $k^0$, will set $k^0$ to the right on-shell value given by the usual square root. In my notation, the vector that only has the 25-dimensional $\delta$-function as the inner product may be obtained from the previous one multiplied by the square root of the $\delta$-function as follows: $$||0;k\rangle = \int_0^\infty dk^{\prime 0} \sqrt{ \delta\left( k^{\prime 0} -\sqrt{m^2+\vec k^2} \right) } |0; (k^{\prime 0},\vec k)\rangle $$ The $\delta$-function in the formula above is the same one as previous one, up to powers of $k^{\prime 0}$ and numerical constants that you should be able to calculate (and the second $\delta$-function also contains $\theta(k^0)$ relatively to the first one - the first one also incorrectly allows negative values of $k^0$). I chose this explicit form here because using the last displayed formula above, you may easily derive the second inner product - with the 25-dimensional $\delta$-function - from the first one. The square root of the $\delta$-function becomes an ordinary one and disappears after the integration over $k^0$, leaving the usual second inner product for the on-shell states.
Those square roots of the $\delta$-function look awkward and they're not really necessary in any way. It's clear that the right physical space is a simple combination of the vectors $||0;\vec k\rangle$ that only depend on 25 components of the momentum. It's just a physical fact that there is a single on-shell condition, so the number of independent components of the energy-momentum vector is 25 rather than 26. If you pick 25 coordinates out of 26, you inevitably violate the "totally manifest" Lorentz symmetry. Kinematics always does so, even in ordinary quantum field theory where cross sections have the $1/2E$ factors etc. that have to multiply the squared Lorentz-invariant amplitudes.
However, aside from these kinemetical factors that are universal (independent on the particle type and its interactions) and easy to learn, there's a lot of actual dynamics - the scattering amplitudes that depend on the momenta and on the particle types; and the infinitely many internal "Hagedorn tower" excitations of the string by the stringy oscillators. It's the "difficult" objects from the previous sentence that remain manifestly Lorentz-covariant in the covariant quantization which is why the quantization is called covariant.
In an alternative quantization, such as the light cone gauge, the Lorentz symmetry becomes really hard to prove. In fact, you need to prove that $[J^{i-},J^{j-}]=0$ in the light-cone coordinates and this condition itself, because of various "double commutator terms", will actually force you to prove that $D=26$ as well. So the calculation of this commutator is not equivalent to the classical Poisson bracket calculations; it knows something about the one-loop behavior of the world sheet theory, too.
So you must learn how the kinematical factors - such as the inner products of the physical states and the extra coefficients that you have to insert to the squared Lorentz-invariant amplitudes to obtain the actual cross sections etc. - work because these are basic and essentially trivial things. The nontrivial physics, i.e. the dynamics, hides in the excitations of the string and the interactions of the strings (or interactions of quantum fields, in similar problems involving quantum field theory), and for those more difficult dynamical questions, it matters whether you want to make unitarity manifest (like in light-cone gauge); or Lorentz symmetry manifest (like in the covariant quantization).