Hi Anweshi,
Since Emerton answered your third grey-boxed question very nicely, let me try at the first two. Suppose $L(s,f)$ is one of the L-functions that you listed (including the first two, which we might as well call L-functions too). (For simplicity we always normalize so the functional equation is induced by $s\to 1-s$.) This guy has an expansion $L(s,f)=\sum_{n}a_f(n)n^{-s}$ as a Dirichlet series, and the most general prime number theorem reads
$\sum_{p\leq X}a_f(p)=r_f \mathrm{Li}(x)+O(x \exp(-(\log{x})^{\frac{1}{2}-\varepsilon})$.
Here $\mathrm{Li}(x)$ is the logarithmic integral, $r_f$ is the order of the pole of $L(s,f)$ at the point $s=1$, and the implied constant depends on $f$ and $\varepsilon$.
Let's unwind this for your examples.
1) The Riemann zeta function has a simple pole at $s=1$ and $a_f(p)=1$ for all $p$, so this is the classical prime number theorem.
2) The Dedekind zeta function (say of a degree d extension $K/\mathbb{Q}$) is a little different. It also has a simple pole at $s=1$, but the coefficients are determined by the rule: $a(p)=d$ if $p$ splits completely in $\mathcal{O}_K$, and $a(p)=0$ otherwise. Hence the prime number theorem in this case reads
$|p\leq X \; \mathrm{with}\;p\;\mathrm{totally\;split\;in}\;\mathcal{O}_K|=d^{-1}\mathrm{Li}(x)+O(x \exp(-(\log{x})^{\frac{1}{2}-\varepsilon})$.
This already has very interesting applications: the fact that the proportion of primes splitting totally is $1/d$ was very important in the first proofs of the main general results of class field theory.
3) If $\rho:\mathcal{G}_{\mathbb{Q}}\to \mathrm{GL}_n(\mathbb{C})$ is an Artin representation then $a(p)=\mathrm{tr}\rho(\mathrm{Fr}_p)$. If $\rho$ does not contain the trivial representation, then $L(s,\rho)$ has no pole in neighborhood of the line $\mathrm{Re}(s)\geq 1$, so we get
$\sum_{p\leq X}\mathrm{tr}\rho(\mathrm{Fr}_p)=O(x \exp(-(\log{x})^{\frac{1}{2}-\varepsilon})$.
The absence of a pole is not a problem: it just means there's no main term! In this particular case, you could interpret the above equation as saying that "$\mathrm{tr}\rho(\mathrm{Fr}_p)$ has mean value zero.
4) For an elliptic curve, the same phenomenon occurs. Here again there is no pole, and $a(p)=\frac{p+1-|E(\mathbb{F}_p)|}{\sqrt{p}}$. By a theorem of Hasse these numbers satisfy $|a(p)|\leq 2$, so you could think of them as the (scaled) deviation of $|E(\mathbb{F}_p)|$ from its
"expected value" of $p+1$. In this case the prime number theorem reads
$\sum_{p\leq X}a(p)=O(x \exp(-(\log{x})^{\frac{1}{2}-\varepsilon})$
so you could say that "the average deviation of $|E(\mathbb{F}_p)|$ from $p+1$ is zero."
Now, how do you prove generalizations of the prime number theorem? There are two main steps in this, one of which is easily lifted from the case of the Riemann zeta function.
Prove that the prime number theorem for $L(s,f)$ is a consequence of the nonvanishing of $L(s,f)$ in a region of the form $s=\sigma+it,\;\sigma \geq 1-\psi(t)$ with $\psi(t)$ positive and tending to zero as $t\to \infty$. So this is some region which is a very slight widening of $\mathrm{Re}(s)>1$. The proof of this step is essentially contour integration and goes exactly as in the case of the $\zeta$-function.
Actually produce a zero-free region of the type I just described. The key to this is the existence of an auxiliary L-function (or product thereof) which has positive coefficients in its Dirichlet series. In the case of the Riemann zeta function, Hadamard worked with the auxiliary function $ A(s)=\zeta(s)^3\zeta(s+it)^2 \zeta(s-it)^2 \zeta(s+2it) \zeta(s-2it)$. Note the pole of order $3$ at $s=1$; on the other hand, if $\zeta(\sigma+it)$ vanished then $A(s)$ would vanish at $s=\sigma$ to order $4$. The inequality $3<4$ of order-of-polarity/nearby-order-of-vanishing leads via some analysis to the absence of any zero in the range $s=\sigma+it,\;\sigma \geq 1-\frac{c}{\log(|t|+3)}.$ In the general case the construction of the relevant auxiliary functions is more complicated. For the case of an Artin representation, for example, you can take $B(s)=\zeta(s)^3 L(s+it,\rho)^2 L(s-it,\widetilde{\rho})^2 L(s,\rho \otimes \widetilde {\rho})^2 L(s+2it,\rho \times \rho) L(s-2it,\widetilde{\rho} \times \widetilde{\rho})$. The general key is the Rankin-Selberg L-functions, or more complicated L-functions whose analytic properties can be controlled by known instances of Langlands functoriality.
If you'd like to see everything I just said carried out elegantly and in crystalline detail, I can do no better than to recommend Chapter 5 of Iwaniec and Kowalski's book "Analytic Number Theory."
The Riemann zeta function $\zeta(s)$ at complex $s$ has the statistical physics interpretation of a partition function at complex temperature. This has no direct physical meaning in general, but for certain models it does. A notable example is the Ising model, where the real and imaginary temperature axes are related by a transformation from an hexagonal to a triangular lattice.
Quite generally, the zeroes of the partition function in the complex plane fall on lines rather than in areas. For ferromagnetic models this is the content of the Yang-Lee theorem. It is therefore natural to expect the Riemann hypothesis to hold, although the Yang-Lee theorem does not cover this case.
An overview of the older literature on complex temperature partition functions is:
"Location of zeros in the complex temperature plane: Absence of Lee-Yang theorem", W. van Saarloos and D. A Kurtze, J. Phys. A: Math. Gen. 17 (1984) 1301-1311.
A more recent paper is
"Complex-temperature partition function zeros of the Potts model on the honeycomb and kagome ́ lattices", H. Feldmann, R. Shrock, and S.-H. Tsai, Phys. Rev. E 57, 1335 (1998).
There are many more papers, it is a quite active field of study.
A very recent paper is http://arxiv.org/pdf/1110.0942
Best Answer
Since, I believe, Jonas Meyer provided an answer to Q1, let me just say about the other questions: The concept of universality is much older. It was in fact introduced by Birkhoff, in the case for entire functions, in 1929 (and that is why universal functions are sometimes called Birkhoff functions) "Demonstration d'un theoreme elementaire sur les fonctions entieres." and by Heins, in the case of bounded holomorphic in the unit disk, in 1955.
A possible reference is "Universal functions in several complex variables" by P.S. Chee.