Recall that commuting observables in quantum mechanics are simultaneously observable. If I have observables A and B, and they commute, I can measure A and then B and the results will be the same as if I measured B and then A (if you insist on being precise, then by the same I mean in a statistical sense where I take averages over many identical experiments). If they don't commute, the results will not be the same: measuring A and then B will produce different results than measuring B and then A. So if I only have access to A and my friend only has access to B, by measuring A several times I can determine whether or not my friend has been measuring B or not.
Thus it is crucial that if A and B do not commute, they are not spacelike separated. Or to remove the double negatives, it is crucial that A and B must commute if they are spacelike separated. Otherwise I can tell by doing measurements of A whether or not my friend is measuring B, even though light could not have reached me from B. Then with the magic of a lorentzian spacetime I could end up traveling to my friend and arriving before he observed B and stop him from making the observation.
The correlation function you wrote down, the one without the commutator, is indeed nonzero. This represents the fact that values of the field at different points in space are correlated with one another. This is completely fine, after all there are events that are common to both in their past light cone, if you go back far enough. They have not had completely independent histories. B U T the point is that these correlations did not arise because you made measurements. You cannot access these correlations by doing local experiments at a fixed spacetime point, you can only see these correlations by measuring field values at spatial location x and then comparing notes with your friend who measured field values at spatial location y. You can only compare notes when you have had time to travel to get close to each other. The vanishing commutator guarantees that your measurements at x did not affect her measurements at y.
It is dangerous to think of fields as creating particles at spacetime locations, because you can't localize a relativistic particle in space to a greater precision than its compton wavelength. If you are thinking of fields in position space it is better to think of what you are measuring as a field and not think of particles at all.
(Actually I should say that I don't think you could actually learn that your friend was measuring B at y by only doing measurements at A. But the state of the field would change, and the evolution of the field would be acausal. I think this is a somewhat technical point, the main idea is that you don't want to be able to affect what the field is going OVER THERE outside the light cone by doing measurements RIGHT HERE because you get into trouble with causality)
I'm digging this thread out just to clarify some things for those who might have a similar question.
Summary
We cannot use $\mathcal T$. Space-like four-vectors are essentially like $(0,x,y,z)$, so we can ignore the time and do three-dimensional rotations to get $(0,-x,-y,-z)=-(0,x,y,z)$.
A la Valter Moretti
As Valter Moretti already pointed out, you cannot just apply $\mathcal P\mathcal T$ to get $(x-y)\to-(x-y)$, because $D(x-y)$ is not invariant under $\mathcal T$.
So the challenge is really to do $(x-y)\to-(x-y)$ using only proper orthochronous Lorentz transformations $SO(1,3)_+$ and $\mathcal P$. This is only possible for space-like four-vectors.
The point about space-like four-vectors is that there is a Lorentz-frame where $t=0$ (boost with $\beta=\frac{t}{|\vec x|^2}$), and in such a frame the parity transformation
$$\mathcal P:(0,x',y',z')\to(0,-x',-y',-z')=-(0,x',y',z')$$
looks just like an inversion. So what you can do for space-like four-vectors is
$$
(t,x,y,z)
\overset{\Lambda}{\to}(0,x',y',z')
\overset{\mathcal P}{\to}-(0,x',y',z')
\overset{\Lambda^{-1}}{\to}-(t,x,y,z)
$$
The difference between this transformation and $\mathcal P\mathcal T$ is that the latter takes all four-vectors to their inverses, whereas the former only a (three dimensional) subspace of the four-dimensional Minkowski space.
A la Peskin and Schroeder
You can actually achieve the same without using $\mathcal P$, that is only with $SO(1,3)_+$ transformations. This means we can continuously bring a fixed space-like vector $p$ to its inverse $-p$. Just do the following steps:
\begin{align*}
(t,x,y,z)
&\overset{R_1}{\to}\left(t,\sqrt{x^2+y^2},0,z\right)\\
&\overset{R_2}{\to}\left(t,\sqrt{x^2+y^2+z^2},0,0\right)\\
&\overset{B\left(\beta=\frac{t}{|\vec x|^2}\right)}{\to}\left(0,\sqrt{x^2+y^2+z^2-t^2},0,0\right)\\
&\overset{R_\pi}{\to}-\left(0,\sqrt{x^2+y^2+z^2-t^2},0,0\right)\\
&\overset{\left(BR_2R_1\right)^{-1}}{\to}-\left(t,x,y,z\right)
\end{align*}
In view of this one should really say that space-like vectors are like $(0,x,0,0)$.
Conclusion
Space-like four-vectors should be thought of as $(0,x,0,0)$, and since there are three spacial dimensions, there is enough room to rotate this vector in any direction. This allows us to invert space-like vectors just by using proper ortochronous transformations $SO(1,3)_+$.
Time-like four-vectors are like $(t,0,0,0)$. There is only one time direction, and hence no rotations are possible. Hence the only way of getting $-t$ is to use time inversion $\mathcal T$.
Short, because there is only one time dimension, but more than one space dimensions, we can invert space-like four vectors by continuous Lorentz-rotations, but not time-like.
Best Answer
According to Weinberg in his text, the components of most quantum fields are not really measurable in any obvious way, so it's best not to think in those terms.
However, the fields do have to get commuted past each other when you evaluate the S-matrix, and then the Lorentz invariance of the S-matrix depends crucially on the fields commuting at space like separations.
Lots of aspects of the physical interpretation in QFT are at best subtle, and philosophically weak but plausible-sounding heuristic arguments are not uncommon. (You can already see people disagreeing about something so basic as whether $\phi$ creates a particle in the comments!) I found the early chapters in Peskin hard going for this exact reason- it's much better when you get to phenomenology and the physics is less opaque. If you want a book you can't argue with, try Weinberg- but this does come at the price of taking twice as long to cover the material, unfortunately in a rather idiosyncratic notation that makes it hard to dip in and out of.