Caveat: in order to give you an overview, I've been vague/sloppy in several places.
Well the basic link to representation theory is that modular forms (and automorphic forms) can be viewed as functions in representation spaces of reductive groups. What I mean is the following: take for example a modular form, i.e. a function $f$ on the upper-half plane satisfying certain conditions. Since the upper-half plane is a quotient of $G=\mathrm{GL}(2,\mathbf{R})$, you can pull $f$ back to a function on $G$ (technically you massage it a bit, but this is the main idea) which will be invariant under a discrete subgroup $\Gamma$. Functions that look like this are called automorphic forms on $G$. The space all automorphic forms on $G$ is a representation of $G$ (via the right regular represenation, i.e. $(gf)(x)=f(xg)$). Basically, any irreducible subrepresentation of the space of automorphic forms is what is called an automorphic representation of $G$. So, modular forms can be viewed as certain vectors in certain (generally infinite-dimensional) representations of $G$. In this context, one can define the Hecke algebra of $G$ as the complex-valued $C^\infty$ functions on $G$ with compact support viewed as a ring under convolution. This is a substitute for the group ring that occurs in the representation theory of finite groups, i.e. the (possibly infinite-dimensional) group representations of $G$ should correspond to the (possibly infinite-dimensional) algebra representations of its Hecke algebra. This type of stuff is the basic connection of modular forms to representation theory and it goes back at least to Gelfand–Graev–Piatestkii-Shapiro's Representation theory and automorphic functions. You can replace $G$ with a general reductive group.
To get to more advanced stuff, you need to start viewing modular forms not just as functions on $\mathrm{GL}(2,\mathbf{R})$ but rather on $\mathrm{GL}(2,\mathbf{A})$, where $\mathbf{A}$ are the adeles of $\mathbf{Q}$. This is a "restricted direct product" of $\mathrm{GL}(2,\mathbf{R})$ and $\mathrm{GL}(2,\mathbf{Q}_p)$ for all primes $p$. Again you can define a Hecke algebra. It will break up into a "restricted tensor product" of the local Hecke algebras as $H=\otimes_v^\prime H_v$ where $v$ runs over all primes $p$ and $\infty$ ($\infty$ is the infinite prime and corresponds to $\mathbf{R}$). For a prime $p$, $H_p$ is the space of locally constant compact support complex-valued functions on the double-coset space $K\backslash\mathrm{GL}(2,\mathbf{Q}_p)/K$ where $K$ is the maximal compact subgroup $\mathrm{GL}(2,\mathbf{Z}_p)$. If you take something like the characteristic function of the double coset $KA_pK$ where $A_p$ is the matrix with $p$ and $1$ down the diagonal, and look at how to acts on a modular form you'll see that this is the Hecke operator $T_p$.
Then there's the connection with number theory. This is mostly encompassed under the phrase "Langlands program" and is a significantly more complicated beast than the above stuff. At least part of this started with Langlands classification of the admissible representation of real reductive groups. He noticed that he could phrase the parametrization of the admissible representations say of $\mathrm{GL}(n,\mathbf{R})$ in a way that made sense for $\mathrm{GL}(n,\mathbf{Q}_p)$. This sets up a (conjectural, though known now for $\mathrm{GL}(n)$) correspondence between admissible representations of $\mathrm{GL}(n,\mathbf{Q}_p)$ and certain $n$-dimensional representations of a group that's related to the absolute Galois group of $\mathbf{Q}_p$ (the Weil–Deligne group). This is called the Local Langlands Correspondence. The Global Langlands Correspondence is that a similar kind of relation holds between automorphic representations of $\mathrm{GL}(n,\mathbf{A})$ and $n$-dimensional representations of some group related to Galois group (the conjectural Langlands group). These correspondences should be nice in that things that happen on one side should correspond to things happening on the other. This fits into another part of the Langlands program which is the functoriality conjectures (really the correspondences are special cases). Basically, if you have two reductive groups $G$ and $H$ and a certain type of map from one to the other, then you should be able to transfer automorphic representations from one to the other. From this view point, the algebraic geometry side of the picture enters simply as the source for proving instances of the Langlands conjectures. Pretty much the only way to take an automorphic representation and prove that it has an associated Galois representation is to construct a geometric object whose cohomology has both an action of the Hecke algebra and the Galois group and decompose it into pieces and pick out the one you want.
As for suggestions on what to read, I found Gelbart's book Automorphic forms on adele groups pretty readable. This will get you through some of what I've written in the first two paragraphs for the group $\mathrm{GL}(2)$. The most comprehensive reference is the Corvallis proceedings available freely at ams.org. To get into the Langlands program there's the book an introduction to the Langlands program (google books) you could look at. It's really a vast subject and I didn't learn from any one or few sources. But hopefully what I've written has helped you out a bit. I think I need to go to bed now. G'night.
I'm not sure I understand the first paragraph, but regarding the second I think this goes by the name of "Gelfand's trick", used to prove eg commutativity of the spherical Hecke algebra attached to $G=GL_n(Q_p)$ and $B=GL_n(Z_p)$ --- namely we first discover that we can write down coset representatives that are invariant under transposition (in this case diagonal matrices with powers of $p$ on the diagonal). Then we recall that transposition is an anti-isomorphism of the group algebra with itself. Thus the Hecke algebra is a subalgebra on which an anti-isomorphism acts as the identity, hence it is in fact commutative. The same trick will work in your general situation.
Best Answer
Well the first thing to say is to look at the very enthusiastic and world-encompassing papers of Cherednik himself on DAHA as the center of the mathematical world (say his 1998 ICM). I'll mention a couple of more geometric aspects, but this is a huuuge area..
There are at least three distinct geometric appearances of DAHA, which you could classify by the number of loops (as in loop groups) that appear - two, one or zero. (BTW for those in the know I will mostly intentionally ignore the difference between DAHA and its spherical subalgebra.)
Double loop picture: See e.g. Kapranov's paper arXiv:math/9812021 (notes for lectures of his on it available on my webpage) and the related arXiv:math/0012155. The intuitive idea, very hard to make precise, is that DAHA is the double loop (or 2d local field, such as F_q((s,t)) ) analog of the finite (F_q) and affine (F_q((s)) ) Hecke algebras. In other words it appears as functions on double cosets for the double loop group and its "Borel" subalgebra. (Of course you need to decide what "functions" or rather "measures" means and what "Borel" means..) This means in particular it controls principal series type reps of double loop groups, or the geometry of moduli of G-bundles on a surface, looked at near a "flag" (meaning a point inside a curve inside the surface). The rep theory over 2d local fields that you would need to have for this to make sense is studied in a series of papers of Kazhdan with Gaitsgory (arXiv:math/0302174, 0406282, 0409543), with Braverman (0510538) and most recently with Hrushovski (0510133 and 0609115). The latter is totally awesome IMHO, using ideas from logic to define definitively what measure theory on such local fields means.
Single loop picture: Affine Hecke algebras have two presentations, the "standard" one (having to do with abstract Kac-Moody groups) and the Bernstein one (having to do specifically with loop groups). These two appear on the two sides of Langlands duality (cf eg the intro to the book of Chriss and Ginzburg). Likewise there's a picture of DAHA that's dual to the above "standard" one. This is developed first in Garland-Grojnowski (arXiv:q-alg/9508019) and more thoroughly by Vasserot arXiv:math/0207127 and several papers of Varagnolo-Vasserot. The idea here is that DAHA appears as the K-group of coherent sheaves on G(O)\G(K)/G(O) - the loop group version of the Bruhat cells in the finite flag manifold (again ignoring Borels vs parabolics). Again this is hard to make very precise. This gives in particular a geometric picture for the reps of DAHA, analogous to that for AHA due to Kazhdan-Lusztig (see again Chriss-Ginzburg).
[EDIT: A new survey on this topic by Varagnolo-Vasserot has just appeared.]
Here is where geometric Langlands comes in: the above interp means that DAHA is the Hecke algebra that acts on (K-groups of) coherent sheaves on T^* Bun_G X for any Riemann surface X -- it's the coherent analog of the usual Hecke operators in geometric Langlands. Thus if you categorify DAHA (look at CATEGORIES of coherent sheaves) you get the Hecke functors for the so-called "classical limit of Langlands" (cotangent to Bun_G is the classical limit of diffops on Bun_G).
The Cherednik Fourier transform gives an identification between DAHA for G and the dual group G'. In this picture it is an isom between K-groups of coherent sheaves on Grassmannians for Langlands dual groups (the categorified version of this is conjectured in Bezrukavnikov-Finkelberg-Mirkovic arXiv:math/0306413). This is a natural part of the classical limit of Langlands: you're supposed to have an equivalence between coherent sheaves on cotangents of Langlands dual Bun_G's, and this is its local form, identifying the Hecke operators on the two sides!
In this picture DAHA appears recently in physics (since geometric Langlands in all its variants does), in the work of Kapustin (arXiv:hep-th/0612119 and with Saulina 0710.2097) as "Wilson-'t Hooft operators" --- the idea is that in SUSY gauge theory there's a full DAHA of operators (with the above names). Passing to the TFT which gives Langlands kills half of them - a different half on the two sides of Langlands duality, hence the asymmetry.. but in the classical version all the operators survive, and the SL2Z of electric-magnetic/Montonen-Olive S-duality is exactly the Cherednik SL2Z you mention..
Finally (since this is getting awfully long), the no-loop picture: this is the one you referred to in 2. via Dunkl type operators. Namely DAHA appears as difference operators on H/W (and its various degenerations, the Cherednik algebras, appear by replacing H by h and difference by differential). In this guise (and I'm not giving a million refs to papers of Etingof and many others since you know them better) DAHA is the symmetries of quantum many-body systems (Calogero-Moser and Ruijsenaars-Schneiders systems to be exact), and this is where Macdonald polynomials naturally appear as the quantum integrals of motion. The only thing I'll say here is point to some awesome recent work of Schiffmann and Vasserot arXiv:0905.2555, where this picture too is tied to geometric Langlands.. very very roughly the idea is that H/W is itself (a degenerate version of an open piece of) a moduli of G-bundles, in the case of an elliptic curve. Thus studying DAHA is essentially studying D-modules or difference modules on Bun_G in genus one (see Nevins' paper arXiv:0804.4170 where such ideas are developed further). Schiffman-Vasserot show how to interpret Macdonald polynomials in terms of geometric Eisenstein series in genus one.. enough for now.