EDIT: I'm leaving this up as background reading to @drake's answer. (The point of the following is that the path integral does indeed give the correct time ordering, so it is producing the correct $\theta$-function weighted, time-ordered sums, which must be accounted for when differentiating its output.)
The two formalisms are equivalent; if they don't give the same result, something is wrong in the calculation. To see this you have to understand a subtlety which is not usually well-explained in textbooks, namely that the path integral is not defined merely by taking the limit of a bunch of integrals of the form $\int_{\mbox{lattice fields}} e^{iS(\phi)} d\phi$.
The problem is that these finite-dimensional integrals are not absolutely convergent, because $|e^{iS(\phi)}| = 1$. To define even the lattice path integral in Minkowski signature, you have to specify some additional information, to say exactly what is meant by the integral.
In QFT, the additional information you want is that the path integral should be calculating the kernel of the time evolution operator $e^{iH\delta t}$, which is an analytic function of $\delta t$. This fact is usually expressed by saying that the Minkowski signature path integral is the analytic continuation of a Euclidean signature path integral: The Euclidean $n$-point functions $E(y_1,...,y_n)$ defined by
$E(y_1,...,y_n) = \int \phi(y_1)...\phi(y_n) e^{-S_E(\phi)} d\phi$
are analytic functions of the Euclidean points $y_i \in \mathbb{R}^d$. This function $E$ can be continued to a function $A(z_1,...,z_n)$ of $n$ complex variables $z_i \in \mathbb{C}^d$. This analytic function $A$ does not extend to the entire plane; it has singularities, and several different branches. Each branch corresponds to a different choice of time-ordering. One branch is the correct choice, another choice is the 'wrong sign' time-ordering. Other choices have wrong signs on only some subsets of the points. If you restrict $A$ to the set $B$ of boundary points of the correct branch, you'll get the Minkowski-signature $n$-point functions $A|_B = M$, where $M(x_1,...,x_n) = \langle \hat{\phi}(x_1)...\hat{\phi}(x_n)\rangle_{op}$ and the $x_i$ are points in Minkowski space.
In perturbation theory, most of this detail is hidden, and the only thing you need to remember is that the $+i\epsilon$ prescription selects out the correct time-ordering.
Peskin & Schroeder, An Intro to QFT, are using that$^1$
$$i\Delta(x-y)~:=~\langle 0 | [\phi(x), \phi(y)] |0\rangle \tag{K} $$ vanishes for space-like vectors, see below eq. (2.53) on p. 28. In particular for equal times $x^0=y^0$, we have
$$i\Delta(0,{\bf x}-{\bf y})~=~0.\tag{L}$$
Therefore at the physics level of rigor
$$i\Delta(x-y)\delta(x^0-y^0)~=~0.\tag{M}$$
Differentiation of eq. (M) wrt. $x^0$ then yields OP's eq. (A).
Eq. (A) can alternatively be established using test functions.
--
$^1$ The notation (K) is taken from Itzykson & Zuber, QFT, eq. (3-55).
Best Answer
Probably not a complete answer. However, your item 1. is correct but 1'. $\partial^\mu\theta(x^0)=g^{\mu 0} \delta(x^0)$ is better for this purpose. And then $\partial_\mu\partial^\mu\theta(x^0) = \partial_0\delta(x^0)=\delta'(x^0)$.
Your item 2. is incorrect, the distribution on the l.h.s. is not defined as far as I know. But fortunately it is not needed.
Regarding 3., one can use that $f(x)\delta'(x)=-f'(x)\delta(x)$ if $f(0)=0$, which is obviously true. Now, as a function of $x^0-y^0$ the operator $[\phi(x^0,\vec{x})\phi(y^0,\vec{y})]$ does vanish at $x^0-y^0=0$ because that makes $x-y$ spacelike. (Except at the origin $x-y=0$. Sorry, I don't have the time to deal with that now. But the result is singular there anyways...).
That gives you (2.56).
Regarding your item 4., you may redefine your Green function by a multiplicative factor to obtain the coefficient you want in front of your $\delta$. But the modulus-squared of the coefficient in the tree-level propagator is the coefficient in the kinetic term in the Lagrangian density, which you want correctly normalized. So you have only the freedom to redefine your Green's function by a phase factor. And that is a matter of convention.