This result is due to Gelfand, Graev, Piatetski-Shapiro and has a short proof. I suggest you read Bump: Automorphic forms and representations, Prop. 3.2.3, pp. 285-289.
Let me switch to $G=\mathrm{PGL}_2(\mathbb{R})$ and $\Gamma=\mathrm{PGL}_2(\mathbb{Z})$ for simplicity.
The idea is to consider right convolutions $\rho(\phi)$ by smooth and compactly supported functions $\phi:G\to\mathbb{C}$, which act on the space of automorphic functions $L^2(\Gamma\backslash G)$. This family of operators is sufficiently rich to distinguish automorphic functions: if $f\neq g$ then there is $\phi$ such that $\rho(\phi)f\neq\rho(\phi)g$. Thinking of $\rho(\phi)$ as an operator on $(\Gamma\cap N)\backslash G$, its kernel is given by
$$K(g,h)=\sum_{\gamma\in\Gamma\cap N}\phi(g^{-1}\gamma h).$$
As $N\cong\mathbb{R}$ and $\Gamma\cap N\cong\mathbb{Z}$, the sum can be analyzed by Poisson summation, using the characters of $N$. It turns out that by subtracting the term corresponding to the trivial character, i.e.
$$K_0(g,h)=\int_{N}\phi(g^{-1} n h) \ dn,$$
the rest of the kernel decays rapidly at infinity, hence defines a compact operator. However, on the cuspidal space the operator with kernel $K_0(g,h)$ acts by zero, hence the operator $\rho(\phi)$ is compact on the cuspidal space. In particular, $\rho(\phi)$ has an eigenvalue with finite multiplicity on any right $G$-invariant subspace of cuspidal functions, and from here the result follows by a standard linear algebra argument.
EDIT: I fixed some inaccuracies.
Best Answer
In 1845 paper http://www.ima.umn.edu/preprints/April92/951.pdf (On a New Way of Solving the Linear Equations that Arise in the Method of Least Squares) Jacobi introduced a new iterative method to solve the matrix equation $Ax=b$, where the matrix $A$ is diagonally dominant. As an example, he considers the case $$A=\left(\begin{array}{ccc} 27 & 6 & 0 \\ 6 & 15 & 1 \\ 0 & 1 & 54\end{array} \right),\;\;\;b=\left(\begin{array}{c} 88 \\ 70 \\ 107\end{array}\right).$$ Jacobi uses a rotation to eliminate the biggest off-diagonal element $A_{12}= A_{21}=6$ and then solves the transformed system in three iterations each adding about one digit of accuracy. Perhaps this was a starting point of the theory of tridiagonal (Jacobi) matrices.
In 1950, Lanczos introduced a method for the successive transformation of a given matrix to a tridiagonal matrix which turned to be very important for solving linear systems of equations and eigenvalue problems. For historical overview of these developments see http://www.cs.umd.edu/~oleary/reprints/j28.pdf (Some history of the conjugate gradient and Lanczos algorithms : 1948-1976, by Gene H Golub and Dianne P O'Leary).
Some historically important references about the spectral theory of Jacobi operators can be found in http://annals.math.princeton.edu/wp-content/uploads/annals-v158-n1-p05.pdf (Sum rules for Jacobi matrices and their applications to spectral theory, by Rowan Killip and Barry Simon).