You've motivated me to read the nice development in Mukhanov and think a little about this issue. Here is how I rationalized it to myself:
First, how is this equivalent to the usual definition of the Feynman propagator, involving a particular contour choice?
I think you should consider the fundamental definition of the Feynman propagator not as a propagator with a particular contour choice, but as an propagator that is symmetric under exchange of spacetime coordinates. In this zero-dimensional context, that means exchange of $t$ and $t'$. In QFT language this leads to a structure like:
$$ -i \langle 0 | \Theta(t-t')\psi(t)\psi(t')+ \Theta(t'-t)\psi(t')\psi(t)|0\rangle $$
While in the driven oscillator case the form is:
$$\frac{i}{2\omega} \left(e^{-i\omega(t-t')}\Theta(t-t')+e^{-i\omega(t'-t)}\Theta(t'-t)\right)$$
In either case, you can Fourier transform this and get something with one pole above and one pole below the complex $\omega$ plane, from the two Heaviside functions. But since we are supposed to be looking for a physical intuition, I suggest you think of this as a consequence of demanding that the propagator be symmetric under exchange of times.
Second, are there intuitive words one can drape around this definition? Does it provide any additional physical insight?
As far as a physical interpretation, I think you will just have to accept that the interpretation isn't going to be as straightforward in the case of the Feynman propagator as it is for the retarded propagator. Your question already sets up why this is the case: one is an expectation value, which is something that appears classically and that we have a natural feel for, while the other is an off-diagonal matrix element. The good news is that by this point in our study of quantum mechanics, we've both seen enough of these to have some sense of what they mean. For example, an amplitude of the form $\langle 0_{out}|\hat{q} |0_{in}\rangle$ is well-known in atomic physics as the dipole operator. Roughly speaking, it represents the degree to which the states $|0_{out}\rangle$ and $|0_{in}\rangle$ are coupled by the operator $J \hat{q}$. Summarizing, then, the best interpretation I can offer for the Feynman propagator in this situation is that it is the object which tells you, for a given transient current, the extent to which the resulting output state is coupled to the initial state. So it is a particular, and calculationally important, way of characterizing how much the states were changed by the driving force. Since it is not itself an observable, it is not clear to me that a more intuitive description should exist or what it might entail.
Best Answer
If the operators $X_i$ can be written as a sum of an annihilation and a creation part$^1$
$$X_i~=~A_i + A^{\dagger}_i, \qquad i~\in ~I, \tag{1}$$
$$\begin{align} A_i|\Omega\rangle~=~0, \qquad & \langle \Omega |A^{\dagger}_i~=~0, \cr \qquad i~\in~&I,\end{align}\tag{2}$$
where
$$\begin{align} [A_i(t),A_j(t^{\prime})] ~=~& 0, \cr [A^{\dagger}_i(t),A^{\dagger}_j(t^{\prime})] ~=~& 0, \cr i,j~\in ~&I,\end{align}\tag{3} $$
and
$$\begin{align} [A_i(t),A_j^\dagger(t^{\prime})] ~=~& (c~{\rm number}) \times {\bf 1},\cr i,j~\in~&I,\end{align}\tag{4} $$
i.e. proportional to the identity operator ${\bf 1}$, then one may prove that
Proof of eq. (5): On one hand, the time ordering $T$ is defined as
$$\begin{align} T(&X_i(t)X_j(t^{\prime}))\cr ~=~& \Theta(t-t^{\prime}) X_i(t)X_j(t^{\prime}) +\Theta(t^{\prime}-t) X_j(t^{\prime})X_i(t)\cr ~=~&X_i(t)X_j(t^{\prime}) -\Theta(t^{\prime}-t) [X_i(t),X_j(t^{\prime})]\cr ~\stackrel{(1)+(3)}{=}&~X_i(t)X_j(t^{\prime})\cr &-\Theta(t^{\prime}-t) \left([A_i(t),A^{\dagger}_j(t^{\prime})]+[A^{\dagger}_i(t),A_j(t^{\prime})]\right).\end{align} \tag{6}$$
On the other hand, the normal ordering $::$ moves by definition the creation part to the left of the annihilation part, so that
$$\begin{align}:X_i(t)X_j(t^\prime):~\stackrel{(1)}{=}~& X_i(t)X_j(t^{\prime}) \cr ~-~& [A_i(t),A^{\dagger}_j(t^{\prime})], \end{align}\tag{7}$$
$$ \langle \Omega | :X_i(t)X_j(t^{\prime}):|\Omega\rangle~\stackrel{(1)+(2)}{=}~0.\tag{8}$$
The difference of eqs. (6) and (7) is the lhs. of eq. (5):
$$\begin{align} T(& X_i(t)X_j(t^{\prime})) ~-~:X_i(t)X_j(t^{\prime}): \cr ~\stackrel{(6)+(7)}{=}& \Theta(t-t^{\prime})[A_i(t),A^{\dagger}_j(t^{\prime})] \cr ~+~& \Theta(t^{\prime}-t)[A_j(t^{\prime}),A^{\dagger}_i(t)],\end{align}\tag{9}$$
which is proportional to the identity operator ${\bf 1}$ by assumption (4). Now sandwich eq. (9) between the bra $\langle \Omega |$ and the ket $|\Omega\rangle $. Since the rhs. of eq. (9) is proportional to the identity operator ${\bf 1}$, the unsandwiched rhs. must be equal to the sandwiched rhs. times the identity operator ${\bf 1}$. Hence also the unsandwiched lhs. of eq. (9) must also be equal to the sandwiched lhs. times the identity operator ${\bf 1}$. This yields eq. (5). $\Box$
A similar argument applied to eq. (7) yields that
i.e. a version of eq. (5) without the time-order $T$.
--
$^1$ The operators $A_i$ and $A^{\dagger}_i$ need not be Hermitian conjugates in what follows. We implicitly assume that the vacuum $|\Omega\rangle$ is normalized: $\langle \Omega | \Omega\rangle=1$.