There are basically 3 senses of "Hecke algebra", and they are related to each other. The modular-form sense is a special case of all three.
The oldest version is that motivated by modular forms, if we think of modular forms as functions on (homothety classes of) lattices: the operator $T_p$ takes the average of a $\mathbb C$-valued function over lattices of index $p$ inside a given lattice. Viewing a point $z$ in the upper half-plane as giving the lattice $\mathbb Z z + \mathbb Z$ makes the connection to modular forms of a complex variable.
One important generalization of this idea is through repn theory, realizing that when modular forms are recast as functions on adele groups, the p-adic group $GL_2(\mathbb Q_p)$ acts on modular forms $f$. To say that $p$ does not divide the level becomes the assertion that $f$ is invariant under the (maximal) compact subgroup $GL_2(\mathbb Z_p)$ of $GL_2(\mathbb Q_p)$. Some "conversion" computations show that $T_p$ and its powers become integral operators (often mis-named "convolution operators"... despite several technical reasons not to call them this) of the form $f(g) \rightarrow \int_{GL_2(\mathbb Q_p)} \eta(h)\,f(gh)\,dh$, where $\eta$ is a left-and-right $GL_2(\mathbb Z_p)$-invariant compactly-supported function on $GL_2(\mathbb Q_p)$. The convolution algebra (yes!) of such functions $\eta$ is the (spherical) Hecke algebra on $GL_2(\mathbb Q_p)$.
A slightly larger, non-commutative convolution algebra of functions on $GL_2(\mathbb Q_p)$ consists of those left-and-right invariant by the Iwahori subgroup of matrices $\pmatrix{a & b \cr pc & d}$ in $GL_2(\mathbb Z_p)$, that is, where the lower left entry is divisible by $p$. This algebra of operators still has clear structure, with structure constants depending on the residue field cardinality, here just $p$. (The Iwahori subgroup corresponds to "level" divisible by $p$, but not by $p^2$.) This is the Hecke algebra attached to the affine Coxeter group $\hat{A}_1$.
Replacing $p$ by $q$, and letting it be a "variable" or "indeterminate" gives an example of another generalization of "Hecke algebra".
The latter situation also connects to "quantum" stuff, but I'm not competent to discuss that.
Edit: by now, there are several references for the relation between "classical Hecke operators" (on modular forms) and the group-theoretic, or representation-theoretic, version. Gelbart's 1974 book may have been the first generally-accessible source, though Gelfand-PiatetskiShapiro's 1964 book on automorphic forms certainly contains some form of this. Since that time, Dan Bump's book on automorphic forms certainly contains a discussion of the two notions, and transition between the two. My old book on Hilbert modular forms contains such a comparison, also, but the book is out of print and was created in a time prior to reasonable electronic files, unfortunately.
This is a layman answer. Hecke operators commute and are self-adjoint, hence the modular forms which are eigenvectors wrt. all Hecke operators form a basis of the space of all modular forms (and the same for cusp forms). If $f$ is such an eigenvector then the L-series corresponding to $f$ has multiplicative coefficients, i.e. it can be represented by an Euler product (over all primes). So AFAIK, Hecke operators are important for connections of modular forms with number theory.
Best Answer
It might help to go back to the definition of Hecke operators in level $1$ in Serre's Course in arithmetic. For a prime $p$ and a lattice $\Lambda$, the $p$the Hecke corresondence (I forget if Serre uses exactly this terminology) takes $\Lambda$ to $\sum \Lambda'$, where $\Lambda'$ runs over all index $p$ sublattices of $\Lambda$.
This is a multi-valued function from lattices to lattices (it is $1$-to-$p+1$-valued).
Now lattices (mod scaling) are just elliptic curves: $\Lambda \mapsto \mathbb C/\lambda$. And so we can also think of this as a multi-valued map from the moduli space of ellitic curves (i.e. the $j$-line, or $Y_0(1)$ if you like) to itself.
How to describe a multi-valued map more geometrically? Think about its graph inside $Y_0(1) \times Y_0(1)$. The graph of a function has the property that its projection onto the first factor is an isomorphism. The graph of a $p+1$-valued function has the property that its projection onto the first factor is of degree $p+1$.
This graph has an explicit description: it is just $Y_0(p)$ (the modular curve of level $\Gamma_0(p)$). Remember that $Y_0(p)$ parameterizes pairs $(E,E')$ of $p$-isogenous curves. We embed it into $Y_0(1) \times Y_0(1)$ in the obvious way, by mapping the pair $(E,E')$ (thought of as an element of $Y_0(p)$) to $(E,E')$ (thought of as an element of the product).
In terms of the upper half-plane variable $\tau$, one can think of this map as being $\tau \bmod \Gamma_0(p)$ maps to $\bigl(\tau \bmod SL_2(\mathbb Z), p\tau \bmod SL_2(\mathbb Z) \bigr).$
So we have recast Serre's description of the $p$th Hecke operator in terms of a correspondence on lattices in the geometric language of correspondences on curves: i.e. the $p$th Hecke operator is given by a mutli-valued morphism from $Y_0(1)$ to itself, rigorously encoded by its graph thought of as a curve in the product surface $Y_0(1) \times Y_0(1)$, which is in fact isomorphic to $Y_0(p)$.
We can easily compactify the situation, to get $X_0(p)$ embedding as the graph of a correspondence on $X_0(1) \times X_0(1)$.
[Caveat: Actually the map $Y_0(p) \to Y_0(1) \times Y_0(1)$ need not be an embedding; it is a birational map onto its image, but the image can be singular (and the same applied with $X$'s instead of $Y$'s). This is because the point on $Y_0(p)$ is not just the pair $(E,E')$, but the additional data of the $p$-isogeny $E\to E'$, which is not uniquely determined up to isomorphism in some exceptional cases. But this is a technical point which is not worth fussing about at the beginning.]
The advantage of having a geometric correspondence in sight is that whenever we apply any kind of linearization functor to our curve, the correspondence will turn into a genuine single valued operator.
The point is that if we have a multi-valued function from one abelian group to another, we can just add up the values to get a single-valued function.
So the correspondence $T_p$ induces genuine maps from the Jacobian of $X_0(1)$ to itself, or from the cohomology of $X_0(1)$ to itself, or from the space of holomorphic differentials on $X_0(1)$ to itself.
Now actually in the case of $X_0(1)$, which has genus zero, the Jacobian and the space of holomorphic differentials are trivial. But we can do everything with $X_0(N)$ or $X_1(N)$ in place of $X_0(1)$ for any $N$, and all the same remarks apply.
Remembering that the holomorphic differentials on $X_0(N)$ are the weight two cuspforms of level $N$, one can compute that the $p$th Hecke correspondence gives rise to the usual $p$th Hecke operator on cuspforms in this way.
What's the point of considering the correspondence? There are many; here's one:
if we reduce everything mod $p$, we get a mod $p$ correspondence on the mod $p$ reduction of $X_0(N)$, whose graph is the mod $p$ reduction of $X_0(Np)$. But this latter reduction is well-known to be singular, and in fact reducible; it is the union of two copies of $X_0(N)$. Thus the $p$th Hecke correspondence mod $p$ decomposes as the sum of two simpler correspondences, which one checks to be the Frobenius morphism from $X_0(N)$ Mod $p$ to iself, and its dual.
This is the Eichler--Shimura congruence relation (in some form it actually goes back to Kronecker), and it underlies the relationship between $T_p$-eigenvalues and the trace of Frobenius in the $2$-dimensional Galois reps. attached to Hecke eigenforms.
Some MO posts which are vaguely relevant:
The map on differentials induced by a correspondence
The Eichler --Shimura relation