Something important is missing from your presentation of the Fermi rule. The Schroedinger eq. you mention, $i\hbar(d\psi/dt) = H\psi$, is never going to produce any transitions between eigenstates of $H$ itself: by definition eigenstates are stationary states.
What you probably refer to is something like
$$
i\hbar\frac{\partial \psi_I}{\partial t} = V_I(t) \psi_I
$$
which is the interaction picture form of the Schroedinger eq. in the presence of a perturbation $V$, $i\hbar(d\psi/dt) = (H + V)\psi$. Here $\psi_I(t) = e^{(i/\hbar)Ht}\psi(t)$ and $V_I(t) = e^{(i/\hbar) H t} V e^{-(i/\hbar) H t}$. If $\psi_{i(j)}$ are eigenstates of $H$, then Fermi's golden rule indeed gives the transition rate between them to 2nd order in the (small) perturbation $V$:
$$
J_{i\rightarrow j} \sim |\langle \psi_j|V|\psi_i\rangle|^2
$$
In other words, Fermi's rule concerns an open system undergoing a weak interaction with its environment, usually represented by an electromagnetic field, or more generally, by a an external thermodynamic "bath".
This being said, Fermi's rule is known to be equivalent to the Markov approximation for open systems, see R.Alicki, "The Markov master equations and the Fermi golden rule", Int.J.Theor.Phys.Vol.16(5), 351-355(1977). A very important consequence of the Markov approximation is that the dynamics is no longer time reversible: while under the original Hamiltonian dynamics entropy is conserved, under the Markov approximation it is not. In fact, as you point out, it is possible to justify the H-theorem, and under some additional conditions it can be shown that the dynamics is governed by a dynamical semigroup with a Lindblad-type generator (see for instance Sec.IIC on the "Secular Approximation" here).
Now to the actual question: Fermi's rule is definitely not a principle. The Markov approximation, and therefore Fermi's rule, holds provided the relaxation time $\tau$ it describes for the system is much longer than the bath relaxation time, $\tau >> \tau_{bath}$. When the system-bath interaction is too strong and/or the time scale for the system's relaxation becomes comparable to that of the bath, both the Markov approximation and Fermi's rule cease to apply. What this means is that the dynamics is no-longer memoryless, but depends on the past history of the system. The transition from Markovian to non-Markovian dynamics can be seen even in such a simple system as a qubit in a dissipative environment, which makes it important for entanglement and decoherence problems. For instance, a qubit undergoing revival of coherence driven by a dissipative bath no longer follows a Markovian dynamics. See for instance this recent review on "Non-Markovian dynamics in open quantum systems".
Best Answer
Your second expression is really the more fundamental one, in the sense that Fermi's Golden Rule (FGR) is just an approximation. You can't really go from FGR to the field theoretic result, but you can see why they're equivalent in a certain limit:
As you said, we are using first-order perturbation theory to derive FGR, which means the timescales involved are short compared to the transition rate $\Gamma$. In addition, our perturbation is constant. So, if we turn on our perturbation at some time and wait a short time $\Delta t$, we just expand the operator you wrote to get $$1-iH'\Delta t,$$ throwing away higher-order terms. Now, compare this with $$S \sim 1 + i \mathcal{M},$$ which makes sense if we think of $\mathcal{M}$ as corresponding to something non-trivial to happen. The probability is proportional to $|\mathcal{M}|^2$, by the normal Born rule, and so putting these two expressions together we find ourselves back with FGR.
To actually see why your second expression is even valid in the first place is a bit more involved, but you essentially just repeatedly evolve your state infinitesimally according to the normal Schroedinger equation, and then stick it all together in the correct order (hence the $T$ and integral). This is just the Dyson series.
Aaaand I just saw this was asked 2 years ago; oh well.