Blockquote
In essence, what puzzles me is that I would be happy to use the expression
$$p(A\rightarrow B) ∝ \exp(−const⋅\mathcal{S}[A\rightarrow B])$$
This is not very accurate, you should try to link the partition function of a classical system with the path integral of a quantum system. So the correct expression would be:
$$S ∝_{up \ to \ wick \ rotation} \sum_{paths} exp{ \frac{i}{\hbar}S[path]}$$
How do we obtain such a result ?
Let us start from the partition function of some system.
$$Z=\sum_{states}exp{ -\beta E(state) }$$
If we instroduce some complete basis for the states $|\psi>$ than this can be rewritten as:
$$Z = \sum_{|\psi>}<\psi|exp({-\beta H})\ |\psi> = \int d{|\psi>} <\psi|exp({-\beta H})\ |\psi>$$
now we will perform the wick rotation, if we close our eyes to the details this simple means substituting $\beta = \frac{1}{kT} \rightarrow it$ which is just a simple change of variables. We find:
$$Z = \int d|\psi><\psi|exp(-iHt)|\psi>$$
But Feynmans sum over histories path integral tells us that: $<x|e^{-iHt}|x'> = \int_x^{x'}[dx]e^{i\int_0^t dt' L(t')}$ such that we can rewrite this as:
$$Z = \int d|initial conditions \ \psi>\ \int_{|\psi>}^{|\psi>}d[full configurations\ |\psi>]e^{i\int_0^{\beta} dt' L(t')}$$
The last step is some interpretation.
We see that the configuration at $t=0$ and $t = \beta$ must be identical as we evolved from $|\psi>$ at t=0 to $|\psi>$ at $t=\beta$. This is implemented by requiring time to be a periodic (=compactified) dimension with radius $\beta$.
We also see that we are integrating over all posible intial conditions, this is typically implemented into the path integral with the notation $\int d|\psi>\int_{|\psi>}^{|\psi>}[d|\psi>] = \int_[d|\psi>]$.
We also replace $it \rightarrow t_E$ which indeed corresponds to changing to euclidian signature henche the subscript E.
$$Z = \int[d|\psi>]e^{-\int_0^\beta L_E dt_E}$$
Where $L_E$ is the Euclidian lagrangian obtained via the subsitution $t=it_E$ for example:
$$T = \frac{dx}{dt}\frac{dx}{dt} \rightarrow T_E = \frac{dx}{idt_E}\frac{dx}{idt_E} = -\frac{dx}{dt_E}\frac{dx}{dt_E}$$
$V \rightarrow V_E = V$
Such that, as you meantioned: $L_E = -H_{Minkowski}$
Disclaimer: I might have missed some signs here and there, but the above explanation should definitely help you to understand what is going on !
Best Answer
I will not directly answer your question, rather I'll try to make plausible the connexion between QFT and statistical physics. To my mind the mathematical details are somehow obscure and confusing, whereas using the theory is worth a deal, and give interesting results, especially in condensed matter and nuclear matter problems. For more details you can have a look on the historical references
and some modern accounts (including the path-integral if it's all you want)
as well as the intermediary book
The basic idea of the connexion between quantum field theory and statistical physics is the ressemblance between the Schrödinger evolution operator $e^{-\mathbf{i}Ht/\hbar}$ and the Gibbs weight $e^{-\beta H}$ with $\beta=\left(k_{B}T\right)^{-1}$ representing the temperature.
In many-body problems at zero-temperature, we are mostly interested by the ground state, so we want to calculate things like the expectation value of an observable $O$, defined as $$\left\langle O\right\rangle =\left\langle \emptyset\right|O\left(t\right)\left|\emptyset\right\rangle =\left\langle \emptyset\right|e^{\mathbf{i}Ht/\hbar}Oe^{-\mathbf{i}Ht/\hbar}\left|\emptyset\right\rangle $$ with $\left|\emptyset\right\rangle$ the ground state. Its time dependency comes from the Heisenberg equation, and for simplicity I supposed a time-independent Hamiltonian $H$. Generalisation are not complicated, and read $$\left\langle O\right\rangle =\left\langle \emptyset\right|U\left(-\infty,t\right)OU^{\dagger}\left(t,-\infty\right)\left|\emptyset\right\rangle $$ with the evolution operator $U$ verifying the Schrödinger equation $\mathbf{i}\hbar dU/dt=HU $ for any time-dependent Hamiltonian $H$. We supposed that at time $t\rightarrow-\infty$ the system was in its ground-state. The above formula is the basics for perturbation theory (see below).
In statistical physics, we are interested in the temperature-dependent ground-state. So we just want to know something like $$\left\langle O\right\rangle =\text{Tr}\left\{ \rho O\right\} \rightarrow\text{Tr}\left\{ e^{-\beta H}O\right\} =\left\langle \emptyset\right|e^{-\beta H}O\left|\emptyset\right\rangle $$ where you recognise the quantum average on the left of the arrow, with $\rho$ the density matrix. On the right of the arrow, you can recognise the statistical average, when the density matrix becomes the Gibbs averaging with $\beta=\left(k_{B}T\right)^{-1}$ ($k_{B}$ is the Boltzmann constant and $T$ is the temperature). The chemical potential is discarded for simplicity.
Now the magic trick: thanks to the resemblance between the zero- and finite-temperature writings, we define an operator $\tilde{O}\left(\tau\right)=e^{H\tau}Oe^{-H\tau}$ which looks the same as the previous $e^{\mathbf{i}Ht/\hbar}Oe^{-\mathbf{i}Ht/\hbar}$ except for the redefinition of the time $\tau=\mathbf{i}t/\hbar$, which in practise represents the temperature. If you wish, you can see that rewriting as a Heisenberg-representation of the operator $O$ (in imaginary time, though). The important point is that one can adapt the perturbation techniques to the temperature problems. For instance suppose as usual that $H=H_{0}+H_{\text{int}}$ with a known unperturbed $H_{0}$ and a perturbative $H_{\text{int}}$. Then we define the $S$ matrix as $e^{-H\tau}=e^{-H_{0}\tau}S\left(\tau\right)$ which gives, after differentiation $$-\frac{dS}{d\tau}=\tilde{H}_{\text{int}}S\left(\tau\right)\Rightarrow S\left(\tau\right)=T_{\tau}\exp\left[-\int\tilde{H}_{\text{int}}d\tau\right]$$ with $\tilde{H}_{\text{int}}=e^{H_{0}\tau}H_{\text{int}}e^{-H_{0}\tau}$ being the interaction-representation of the interaction-Hamiltonian. We recognise the Schrödinger equation verified by the $S$ operator, and $T_{\tau}$ is the chronological order operator in the $\tau$ variable. So in principle everything you know about Schrödinger equation, or evolution operator, or more generally about quantum physics applies straightforwardly to the classical statistical physics.
Using this trick we have $e^{-\beta H}=e^{-H_{0}\beta}S\left(\beta\right)$ in the statistical average of the $O$-observable. Now perturbation theory for $S$ helps you finding $\left\langle O\right\rangle $. That's not more complicated.
Now the connexion with the path-integral: since we want to discuss statistical physics (or many-body systems), one needs a way to quantised the fermions and bosons fields. That's how we use path-integrals.
And finally the connexion to phase transition: a phase transition appears when the ground state of the full Hamiltonian does not verify all the symmetries of $H$. So the perturbation theory can help you finding the new ground-state. You can use for instance Hubbard-Stratonovitch decoupling to get the new ground state.
Final remark: Sometimes you can not get the new ground state perturbatively, so you need more evolved techniques. They all come from the path-integral rewriting of what I said above.