TL;DR: The main reason why the naive path integral derivation of SD eqs. works well beyond what could be expected is that both concepts employ the same underlying notion of time ordering, namely the covariant time ordering $T_{\rm cov}$, cf. e.g. my Phys.SE answer here.
I) In the main part of this answer we would like to investigate in more detail some formal aspects of the correspondence between
$$ \text{Path integral formalism}\qquad \longleftrightarrow \qquad
\text{Operator formalism},\tag{1} $$
in particular the cohabitation of, on one hand the Schwinger-Dyson (SD) equations, and on another hand, the Heisenberg's equations of motion (eom).
The correspondence (1) is notoriously subtle, see e.g. Ref. 1. and this Phys.SE post. For a general interacting field theory, both sides of the path integral/operator correspondence (1) are typically not rigorously defined, cf. e.g. Ref. 1. Both sides of the correspondence may in principle receive quantum corrections due to operator ordering problems. So it is difficult to come with reliable statements at this formal level.
To simplify the discussion and gain intuition, we are going to make some assumptions.
We are going to consider free (quadratic) theories only. OP's example is covered by this. Free theories have the advantage that we can present explicit formulas.
It is most economically to argue via the Hamiltonian (as opposed to the Lagrangian) formulation, since we then are only going to have one (as opposed to two) time derivatives. So that's what we are going to do. (Also for simplicity we ignore cases with singular Legendre transformations. Then it is always possible to (Gaussian) integrate out the momenta to get to the corresponding Lagrangian formulation, if that's what one wants.)
Also we treat field theory formally like point mechanics. All spatial coordinates are suppressed via DeWitt condensed notation. Only the time-variable $t$ is manifestly kept. Thus our variables are
$$ z^I, \qquad I~\in~ \{1, \ldots, 2N\},\tag{2} $$
and they depend on time $t$, where $N$ could be infinite.
Also to avoid annoying sign factors, we restrict attention to theories with only Grassmann-even variables.
Furthermore, we assume for simplicity (or via Darboux's Theorem) that the equal-time Poisson bracket is constant and $z$-independent
$$\begin{align}\omega^{IK}
~=~& \{ z^I(t),z^K(t) \}_{PB}, \cr
I,K~\in~&\{1, \ldots, 2N\}. \end{align}\tag{3} $$
Much of the above assumptions can be relaxed, but we will not discuss that here.
II) Classically, with the above assumptions, the Hamiltonian action reads
$$ S_H[z;J]~=~ \int dt\ L_H(z(t);\dot{z}(t);J(t)),\tag{4} $$
where the Hamiltonian Lagrangian is
$$ L_H(z;\dot{z};J)~=~\frac{1}{2} z^I\omega_{IK} \dot{z}^K - H(z,J),\tag{5} $$
and the Hamiltonian is quadratic
$$ \begin{align} H(z,J)~=~&H_0(z) -J_I z^I, \cr
H_0(z)~:=~& \frac{1}{2} z^I h_{IK} z^K. \end{align}\tag{6} $$
The Euler-Lagrange derivatives (with an index raised by the symplectic metric $\omega^{IL}$) correspond to Hamilton's eom
$$\begin{align} 0~\approx~&\omega^{IK}\frac{\delta S_H[z;J]}{\delta z^K(t)}\cr
~=~&\dot{z}^I(t) - \{z^I(t),H(z(t),J(t))\}_{PB} \cr ~=~&\left(\delta^I_K\frac{d}{dt}- h^I{}_K\right)z^K(t) + \omega^{IK} J_K(t). \end{align}\tag{7} $$
(Here the $\approx$ sign means equality modulo classical equations of motion.) Also we have defined the matrix
$$ h^I{}_L~:=~\omega^{IK} h_{KL}. \tag{8} $$
In the corresponding operator formalism, the Hamilton's eom (7) turns into Heisenberg's eom, which is an operator identity.
III) The Hessian reads
$$ \begin{align} {\cal H}_{IK}(t,t^{\prime})~:=~&\frac{\delta^2 S_H[z;J]}{\delta z^I(t)\delta z^K(t^{\prime})} \cr
~=~&\omega_{IK}\delta^{\prime}(t-t^{\prime})
-h_{IK}\delta(t-t^{\prime})\cr
~=~&\omega_{IL}\left(\delta^L_K\frac{d}{dt}- h^L{}_K\right)\delta(t-t^{\prime}).\end{align}\tag{9} $$
The corresponding Green's function $G={\cal H}^{-1}$ is the inverse of the Hessian (9):
$$ G^{IK}(t,t^{\prime})~=~\frac{1}{2} {\rm sgn}(t-t^{\prime})\left(e^{(t-t^{\prime})h}\right)^I{}_L ~\omega^{LK}, \tag{10} $$
$$ \left(\delta^I_L\frac{d}{dt}- h^I{}_L\right)G^{LK}(t,t^{\prime})~=~\omega^{IK}\delta(t-t^{\prime}). \tag{11} $$
IV) We define the partition function
$$\begin{align}
&Z[J]\cr
~:=~ &\int\![dz] e^{\frac{i}{\hbar}S_H[z;J]} \cr
~=~& e^{\frac{i}{\hbar}W_c[J]} \cr
~=~& \int\![dz] \exp\left[\frac{i}{2\hbar}\iint \! dt~dt^{\prime}~z^I(t) ~{\cal H}_{IK}(t,t^{\prime})~ z^K(t^{\prime})
+ \frac{i}{\hbar}\int \! dt~J_I(t) z^I(t)\right] \cr
~=~& Z[0]\exp\left[-\frac{i}{2\hbar}
\iint \! dt~dt^{\prime}~J_I(t) ~G^{IK}(t,t^{\prime})~ J_K(t^{\prime})\right] ,\end{align}\tag{12} $$
and the generator of connected Feynman diagrams
$$\begin{align} W_c[J]~:=~&\frac{\hbar}{i}\ln Z[J]\cr
~=~& W_c[0]- \frac{1}{2} \iint \! dt~dt^{\prime}~J_I(t) ~G^{IK}(t,t^{\prime})~ J_K(t^{\prime}) \end{align} ,\tag{13} $$
whose explicit form follows from Gaussian integration. The quantum-average/expectation-value in the Heisenberg picture is defined as
$$\begin{align}\left< T_{\rm cov}\{F[\widehat{z}]\} \right>_J
~=~& \frac{1}{Z[J]}\int\![dz]~F[z]~e^{\frac{i}{\hbar}S_H[z;J]}\cr
~=~&\frac{1}{Z[J]} F\left[ \frac{\hbar}{i} \frac{\delta}{\delta J} \right] Z[J] .\end{align}\tag{14} $$
Here $T_{\rm cov}$ is covariant time-ordering, i.e. time-differentiations inside its argument should be taken after/outside the usual time ordering $T$. Here time-ordering $T$ is defined as
$$\begin{align}T\left\{\widehat{A}(t)\widehat{B}(t^{\prime})\right\}~:=~& \theta(t-t^{\prime})\widehat{A}(t)\widehat{B}(t^{\prime})\cr
&+\theta(t^{\prime}-t)\widehat{B}(t^{\prime})\widehat{A}(t). \end{align}\tag{15} $$
It is crucial that time-differentiation and time-ordering do not commute:
$$\begin{align}T_{\rm cov}&\left\{\frac{d\widehat{A}(t)}{dt}\widehat{B}(t^{\prime})\right\} \cr
~:=~&\frac{d}{dt}T\left\{\widehat{A}(t)\widehat{B}(t^{\prime})\right\}\cr
~=~& T\left\{\frac{d\widehat{A}(t)}{dt}\widehat{B}(t^{\prime})\right\}+\delta(t-t^{\prime})[\widehat{A}(t),\widehat{B}(t^{\prime})], \end{align}\tag{16}$$
but instead give rise to equal-time commutator (contact) terms.
It is important to realize that the time derivatives inside the Boltzmann factor $e^{\frac{i}{\hbar}S_H[z;J]}$ in the path integral should respect the underlying time slicing procedure. See e.g. this and this Phys.SE answer. This induces the covariant time-ordering $T_{\rm cov}$ in eq. (14).
The 1-point function reads
$$\begin{align} \left< \widehat{z}^I(t)\right>_J
~=~& \frac{\delta W_c[J]}{\delta J_I(t)}\cr
~=~& -\int \!dt^{\prime} ~G^{IK}(t,t^{\prime})~J_K(t^{\prime}), \end{align}\tag{17} $$
while the time-ordered 2-point function reads
$$\begin{align} &\left< T_{\rm cov}\{ \widehat{z}^I(t)\widehat{z}^K(t^{\prime})
\}\right>_J\cr
~=~& -\frac{\hbar^2}{Z[J]}
\frac{\delta^2 Z[J]}{\delta J_I(t)\delta J_K(t^{\prime})}\cr
~=~&\frac{\delta W_c[J]}{\delta J_I(t)}\frac{\delta W_c[J]}{\delta J_K(t^{\prime})}+\frac{\hbar}{i}\frac{\delta^2 W_c[J]}{\delta J_I(t)\delta J_K(t^{\prime})}\cr
~=~&\left< \widehat{z}^I(t)\right>_J
\left< \widehat{z}^K(t^{\prime})\right>_J
+i\hbar G^{IK}(t,t^{\prime}). \end{align}\tag{18}$$
The corresponding 2-point function without time-ordering reads:
$$\begin{align} \left< \widehat{z}^I(t)\widehat{z}^K(t^{\prime}) \right>_J
~=~&\left< \widehat{z}^I(t)\right>_J
\left< \widehat{z}^K(t^{\prime})\right>_J\cr
&+\frac{i\hbar}{2} \left(e^{(t-t^{\prime})h}\right)^I{}_L ~\omega^{LK}.\end{align} \tag{19} $$
V) The Schwinger-Dyson (SD) equations read
$$ i\hbar\left< T_{\rm cov}\left\{\frac{\delta F[\widehat{z}]}{\delta z^I(t)} \right\}\right>_J
~=~\left< T_{\rm cov}\left\{ F[\widehat{z}]\frac{\delta S_H[\widehat{z};J]}{\delta z^I(t)}\right\}\right>_J.\tag{20} $$
The SD eqs. (20) are here formally written in the operator language, but their justification is most easily argued via the path integral formalism, cf. e.g. this Phys.SE post. The SD eqs. (20) simply reflect the fact that a path integral of a total derivative vanishes if the boundary contributions are zero
$$ 0~=~\int [dz]\frac{\delta}{\delta z^I(t)}\left\{F[z] e^{\frac{i}{\hbar}S_H[z;J]}\right\}, \tag{21} $$
cf. this Phys.SE post.
VI) Naively the rhs. of the SD-eqs. (20) is proportional to the Heisenberg eom (7), which is an operator expression that vanishes identically, so why then there is a non-zero quantum correction on the lhs. of the SD-eqs. (20)? The resolution to this apparent paradox is hidden in the fact that time-differentiation and time-ordering do not commute, cf. eq. (16). To see how this works, pick for simplicity the functional $F[z]=z^{K}(t^{\prime})$ of a single variable. (Then it is enough to use the Poisson bracket rather than the Moyal-Groenewold $\star$-product.) After raising an index by the symplectic metric $\omega^{IL}$, the SD-eqs. (20) become
$$\begin{align} i\hbar\omega^{IK} &\delta(t-t^{\prime}) \cr
~\stackrel{(20)}{=}~&\left< T_{\rm cov}\left\{\omega^{IL}\frac{\delta S_H[\widehat{z};J]}{\delta z^L(t)}\widehat{z}^K(t^{\prime})\right\} \right>_J\cr
~\stackrel{(7)}{=}~&\left< \frac{d}{dt}T\left\{\widehat{z}^I(t)
\widehat{z}^K(t^{\prime})\right\}\right>_J \cr
&- \left< T\left\{\{\widehat{z}^I(t),H(\widehat{z}(t),J(t))\}_{PB} \widehat{z}^K(t^{\prime})\right\}\right>_J\cr
~=~&\left(\delta^I_L\frac{d}{dt}- h^I{}_L\right)\left< T\left\{\widehat{z}^L(t)
\widehat{z}^K(t^{\prime})\right\}\right>_J \cr
&+\omega^{IL} J_L(t)\left< \widehat{z}^K(t^{\prime})\right>_J \cr
~\stackrel{(16)}{=}~& i\hbar\omega^{IK} \delta(t-t^{\prime}) \cr
&+\left< T\left\{ \underbrace{\left(\frac{d\widehat{z}^I(t)}{dt} - \{\widehat{z}^I(t),H(\widehat{z}(t),J(t))\}_{PB} \right)}_{=0}\widehat{z}^K(t^{\prime})\right\}\right>_J\cr
~\stackrel{(7)}{=}~&i\hbar\omega^{IK} \delta(t-t^{\prime}).\end{align}\tag{22} $$
Eq. (22) shows how SD-eqs. (20) and Heisenberg's eom (7) can co-exist.
References:
- F. Bastianelli and P. van Nieuwenhuizen, Path Integrals and Anomalies in Curved Space, 2006.
Good question; I remember spending hours trying to understand this when I first learned QFT. Let's address your two main points in turn. First, you say
I don't understand how rhyme these two different pictures.
Let's outline how to connect the two pictures in steps. It's a good exercise to try and work through all of the gory details yourself, so I encourage you to try!
- For each admissible classical field configuration $\varphi:\mathbb R^3\to \mathbb R$, let $|\varphi,t\rangle$ denote a field configuration eigenstate at time $t$. Namely,
\begin{align}
\hat \phi(t,\mathbf x)|\varphi,t\rangle = \varphi( \mathbf x)|\varphi,t\rangle.
\end{align}
Take special note of the fact that $\hat\phi$ and $\varphi$ are different. The former is a operator valued distribution defined on spacetime, while the latter is a classical field configuration defined on space only.
Show that given admissible classical field configurations $\varphi_a,\varphi_b:\mathbb R^3\to \mathbb R$, there is a simple functional integral expression for the time-ordered expectation value from $|\varphi_a, t_a\rangle$ to $|\varphi_b, t_b\rangle$ of the product of a finite sequence of field operators:
\begin{align}
\langle \varphi_b, t_b| T\big[\hat \phi(x_1)\cdots \hat \phi(x_n)\big]&|\varphi_a, t_a\rangle \\
&= \int\limits_{\phi(t_a,\mathbf x) = \varphi_a(\mathbf x)}^{\phi(t_b,\mathbf x) = \varphi_b(\mathbf x)} \mathscr D\phi \,\phi(x_1)\cdots \phi(x_n)\,e^{iS_{t_a,t_b}[\phi]} \tag{$\star$}
\end{align}
where we have defined
\begin{align}
S_{t_a,t_b}[\phi] = \int_{t_a}^{t_b}dt\int d^3\mathbf x \,\mathscr L_\phi(t)
\end{align}
and $\mathscr L_\phi$ is the Lagrangian density of the theory.
Show that the expectation value on the left hand side of $(\star)$ can be used to compute a corresponding vacuum expectation value (vev);
\begin{align}
\lim_{t\to(1-i\epsilon)\infty} \frac{\langle \varphi_b, t| T\big[\hat \phi(x_1)\cdots \hat \phi(x_n)\big]|\varphi_a, -t\rangle}{\langle \varphi_b, t |\varphi_a, -t\rangle}
&= \langle 0|T\big[\hat \phi(x_1)\cdots \hat \phi(x_n)\big]|0\rangle
\end{align}
where $\epsilon$ is a "positive infinitesimal" (namely you take the $\epsilon\to 0$ limit at the end). This is often called the $i\epsilon$ prescription; notice it's basically a clever trick for projecting out the ground state from a general expectation value.
Notice that the functional integration on the right hand side of $(\star)$ can be written as
\begin{align}
\int\limits_{\phi(t_a,\mathbf x) = \varphi_a(\mathbf x)}^{\phi(t_b,\mathbf x) = \varphi_b(\mathbf x)} \mathscr D\phi \,\phi(x_1)\cdots \phi(x_n)\,e^{iS_{t_a,t_b}[\phi] }=\left(\frac{1}{i}\right)^n \frac{\delta^n Z_{t_a, \varphi_a, t_b, \varphi_b}[J]}{\delta J(x_1)\cdots \delta J(x_n)}\bigg|_{J=0}
\end{align}
where we have defined
\begin{align}
Z_{t_a, \varphi_a, t_b, \varphi_b}[J] = \int\limits_{\phi(t_a,\mathbf x) = \varphi_a(\mathbf x)}^{\phi(t_b,\mathbf x) = \varphi_b(\mathbf x)} \mathscr D\phi \,e^{iS_{t_a,t_b}[\phi]+ i\int_{t_a}^{t_b}\int d^3\mathbf x\,J(x)\phi(x)}
\end{align}
Combine steps 2-4 to show that if we define
\begin{align}
W[J] = \lim_{t\to(1-i\epsilon)\infty}\frac{Z_{t_a, \varphi_a, t_b, \varphi_b}[J]}{Z_{t_a, \varphi_a, t_b, \varphi_b}[0]},
\end{align}
then we obtain our desired expression which gives vacuum expectation values in terms of path integrals:
\begin{align}
\langle 0|T\big[\hat \phi(x_1)\cdots \hat \phi(x_n)\big]|0\rangle
&= \left(\frac{1}{i}\right)^n \frac{\delta^n W[J]}{\delta J(x_1)\cdots \delta J(x_n)}\bigg|_{J=0}
\end{align}
Notice that
\begin{align}
G^{(n)}(x_1, \dots, x_n) = \langle 0|T\big[\hat \phi(x_1)\cdots \hat \phi(x_n)\big]|0\rangle
\end{align}
is just a suggestive definition which makes us think of Green's functions. It's suggestive because, for example, $G^{(2)}(x_1, x_2)$, the so called "two-point function," is the Green's function for the corresponding classical field theory.
Second, you ask
Do these Feynman diagram for the two different approaches somehow represent the same scattering amplitude?
The LSZ reduction formula is the answer to the question of how vevs, or equivalently Green's functions, are related to the $S$-matrix and scattering amplitudes, and above we have argued how the canonical formalism (which is formulated in terms of vevs) is related to the functional integral formalism, so we have found how the functional integral formalism allows us to compute the $S$-matrix. In practice, it's true, that you don't see people explicitly using the LSZ reduction formula, but that's because although it conceptually underlies the connection between Green's functions and the $S$-matrix, in practice people have already used LSZ to justify codified rules, namely Feynman rules, that allow one to go directly from Feynman diagrams (which simply represent terms in the perturbative expansions of Feynman integrals) to scattering amplitudes.
Best Answer
EDIT: I'm leaving this up as background reading to @drake's answer. (The point of the following is that the path integral does indeed give the correct time ordering, so it is producing the correct $\theta$-function weighted, time-ordered sums, which must be accounted for when differentiating its output.)
The two formalisms are equivalent; if they don't give the same result, something is wrong in the calculation. To see this you have to understand a subtlety which is not usually well-explained in textbooks, namely that the path integral is not defined merely by taking the limit of a bunch of integrals of the form $\int_{\mbox{lattice fields}} e^{iS(\phi)} d\phi$.
The problem is that these finite-dimensional integrals are not absolutely convergent, because $|e^{iS(\phi)}| = 1$. To define even the lattice path integral in Minkowski signature, you have to specify some additional information, to say exactly what is meant by the integral.
In QFT, the additional information you want is that the path integral should be calculating the kernel of the time evolution operator $e^{iH\delta t}$, which is an analytic function of $\delta t$. This fact is usually expressed by saying that the Minkowski signature path integral is the analytic continuation of a Euclidean signature path integral: The Euclidean $n$-point functions $E(y_1,...,y_n)$ defined by
$E(y_1,...,y_n) = \int \phi(y_1)...\phi(y_n) e^{-S_E(\phi)} d\phi$
are analytic functions of the Euclidean points $y_i \in \mathbb{R}^d$. This function $E$ can be continued to a function $A(z_1,...,z_n)$ of $n$ complex variables $z_i \in \mathbb{C}^d$. This analytic function $A$ does not extend to the entire plane; it has singularities, and several different branches. Each branch corresponds to a different choice of time-ordering. One branch is the correct choice, another choice is the 'wrong sign' time-ordering. Other choices have wrong signs on only some subsets of the points. If you restrict $A$ to the set $B$ of boundary points of the correct branch, you'll get the Minkowski-signature $n$-point functions $A|_B = M$, where $M(x_1,...,x_n) = \langle \hat{\phi}(x_1)...\hat{\phi}(x_n)\rangle_{op}$ and the $x_i$ are points in Minkowski space.
In perturbation theory, most of this detail is hidden, and the only thing you need to remember is that the $+i\epsilon$ prescription selects out the correct time-ordering.