The non-effectivity, as far as I understand, is already present in Thue's Theorem, thus to understand it, one can look a the proof of the latter. The issue is roughly that, to show that there are not many "close rational approximations" $p/q$, one starts with the assumption that there exists one very close one $p_0/q_0$, and show that this very good approximation "repulses" or excludes other similarly or better ones. This of course doesn't work if the first $p_0/q_0$ doesn't exist... but such an assumption also gives the result! The ineffectivity is that we have no way of knowing which of the two alternatives has led to the conclusion.
There is a well-known analogy with the Siegel (or Landau-Siegel) zero question in the theory of Dirichlet $L$-functions. Siegel -- and it is certainly not coincidental that this is the same Siegel as in Thue--Siegel--Roth, though Landau did also have crucial ideas in that case -- proved an upper bound for real-zeros of quadratic Dirichlet $L$-functions by (1) showing that if there is one such $L$-function with a zero very close to $1$, then this "repulses" the zeros of all other quadratic Dirichet $L$-functions (this phenomenon is fairly well-understood under the name of Deuring-Heilbronn phenomenon), thus obtaining the desired bound; (2) arguing that if the "bad" $L$-function of (1) did not exist, then one is done anyway.
Here the ineffectivity is clear as day: the "bad" character of (1) is almost certainly non-existent, because it would violate rather badly the Generalized Riemann Hypothesis. But as far as we know today, we have to take into account the possibility of the existence of these bad characters... a possibility which however does have positive consequences, like Siegel's Theorem...
(There's much more to this second story; an entertaining account appeared in an article in the Notices of the AMS one or two years ago, written by J. Friedlander and H. Iwaniec.)
As Terry mentions in the comments, the reason for the $\sqrt{5}$ is that the limiting case, the golden ratio, forces it. There is a very neat explanation of all of this in the classic number theory book by Hardy and Wright, pages 209 to 212. I give a brief sketch of the ideas.
- Why $\phi$ is the worst case.
As Hardy and Wright put it, "from the point of view of rational approximation, the simplest numbers are the worst. The "simplest" of all irrationals, from this point of view, is the number $\phi$."
The reason for this is that if we consider the best approximation for a given $\alpha$,
$$\left|\alpha - \frac{p_n}{q_n} \right| = \frac{1}{q_nq'_{n+1}} < \frac{1}{a_{n+1}q^2_n}$$
it is best when $a_{n+1}$ is large. But in the case of $\phi$, every $a_{n+1}$ is as small as possible.
- Why it leads to $\sqrt{5}$
The idea is to simply see what happens when we approximate $\phi$. It roughly goes like this:
$$\left|\phi - \frac{p_n}{q_n} \right| = \frac{1}{q_nq'_{n+1}} \sim \frac{1}{q^2_n}\frac{1}{1+2\phi}=\frac{1}{q^2_n\sqrt{5}}$$
- $\sqrt{5}$ is best possible
This follows easily by contradiction. There are no infinitely many $p$, $q$ such that
$$\alpha=\frac{p}{q}+\frac{\delta}{q^2}$$ and $$|\delta|<\frac{1}{\sqrt{5}}$$
Now any proof of the theorem should look convincing enough, knowing where the $\sqrt{5}$ it presupposes comes from.
EDIT. I include for completeness a nice alternative proof brought up by Marty and Halbort in the comments.
L. R. Ford, "Fractions" (Amer. Math. Monthly, Vol 45, No 9 (Nov 1938))
Best Answer
Let's assume $\alpha > 0$. If $\alpha$ is a quadratic irrational, its simple continued fraction $a_0 + \dfrac{1}{a_1 + \frac{1}{a_2+\ldots}}$ is eventually periodic. Every $p/q$ (in lowest terms) with $\left|\alpha - \dfrac{p}{q}\right| < \dfrac{1}{2q^2}$ is a convergent of $\alpha$, and for the $n$'th convergent $$ \dfrac{1}{q_n^2 (a_{n+1}+2)} < \left| \alpha - \dfrac{p_n}{q_n} \right| \le \dfrac{1}{q_n^2 a_{n+1}}$$ Thus $\dfrac{1}{a_M+2} \le c(\alpha) \le \dfrac{1}{a_M}$ where $a_M$ is the largest element in the continued fraction of $\alpha$.