First, a correction. The first formula is the probability, not probability amplitude.
And it's computed at the leading order only, "linearized" in a sense, so of course it is only a good approximation for $P_{f\leftarrow i}\ll 1$. When the probability becomes comparable to one, subleading and higher-order corrections become important because one must also study how the newly created coefficients in front of other states – states absent in the initial state – change by the time evolution.
The perturbation theory always becomes inadequate when the perturbation, in this case the matrix element $\langle f |V|i\rangle$, is too large. But one must properly understand what "too large" means. And it means $P_{f\leftarrow i} \geq O(1)$ which is equivalent to $\langle f |V|i\rangle \cdot \Delta t \geq O(\hbar)$. For transitions at $\omega_{fi}\to 0$, the requirement for "how small the perturbation matrix element has to be" simply gets tougher, the upper limit becomes smaller. One more equivalent way to say it: for the perturbation theory to be OK, you need to have $\Delta t\ll \hbar / \langle f |V|i\rangle$.
However, your treatment has one more problem. Well, one of two problems. If you consider the transition to a discrete final state that just happens to have a finite energy, you are dealing with degenerate perturbation theory and you should first rediagonalize $H_0+V$ in this Hilbert subspace, to find out that the actual energy eigenstates differ from the original initial state and their energies actually differ.
If you consider a transition to a final state that belongs to a continuum, then you're interested in the integrated probability over $\omega_f$, anyway, and in that case, $\sin^2 Y / Y^2$ may be approximated by a multiple of the delta-function which imposes the "naive" energy conservation law. See e.g. this document for some intro to the method. My inequality appears as (11.40) on page 104.
The basis of the Hilbert space in Schrödinger's picture is assumed to be time-independent regardless of any properties of the Hamiltonian. The Hamiltonian is just another operator. If the Hamiltonian is time-dependent, its eigenstates and eigenvalues are obviously time-dependent, too.
Both equations you write down only express the fact that the basis of eigenstates of $H(t)$ is still a basis, so a general ket vector, including the actual state vector of the system, may be expanded as a linear superposition of these basic vectors with some general complex coefficients $c_n(t)$. The two expansions only differ by the phase one includes into the coefficients $c_n(t)$ or into the basis vectors $|n;t\rangle$. One convention includes the phase $\exp(i\theta_n(t))$, another one doesn't, and so on. Obviously, there is no "universally mandatory" rule that would dictate the right phase of these vectors so there's some freedom about the notation. Note that a phase factor times an eigenstate is still an eigenstate.
Whatever your convention for the phases is, if you carefully follow the maths and remember what the symbols mean – the defining equations – you will be able to derive the invariant claims about the adiabatic theorem. The Wikipedia-Sakurai conventions treat the phases wisely and naturally, to speed up the derivations.
Best Answer
As you indicated, it is only an approximation that $P(i\rightarrow n) \approx |c_n^{(1)}|^2$. It only holds, if $ |\sum_j \lambda^j c_n^{(j)}|^2 = |c_n|^2 \ll 1$. Since you have $|c_n^{(1)}|^2 = 1$, you have to include higher order terms for the approximation to be valid (also might be not valid at all; depends on the perturbation).
A quick note: I went through my lecture notes because I never saw the $\delta_{ni}$-term in the expression for $c_n^{(1)}$, and I also didn't have that there. I'm also a bit confused what the $U_I(t)$ is here. Are you confusing Interaction picture with the Schrödinger picture?
The correct expression for for the $c_n$'s can be derived from their differential equations: $\frac{d}{dt}c_n^{(j)} = \frac{1}{i\hbar}\sum_m V_{nm}(t)\ c_m^{(j-1)} e^{i\omega_{nm}t}$ So for a given initial state $|i\rangle$: $c_n^{(0)} = \delta_{ni}$, and therefore $c_f^{(1)}(t) = \frac{1}{i\hbar} \int_{t_0}^t V_{fi} e^{i\omega_{fi}t'}dt'$