I think it's possible that non-geometric extensions are indeed not as directly visualizable as geometric ones.
Some terminology: let $k$ be a field, and either assume $k$ has characteristic $0$ or beware that some separability issues are being omitted in what follows. A (one variable) function field over $k$ is a finitely generated field extension $K/k$ of transcendendence degree
one. This already allows for the possibility of a nontrivial constant extension, which is often excluded in geometric endeavors: for instance, according to this definiton, $\mathbb{C}(t)$ is a function field over $\mathbb{R}$, but a sort of weird[1] one: e.g. it has no $\mathbb{R}$-points.
One says a function field $K/k$ is regular if $k$ is algebraically closed in $K$; i.e., any element of $K$ which is algebraic over $k$ already lies in $k$ [plus separability stuff in positive characteristic]. Any function field can be made regular just by enlarging the constant field to be the algebraic closure of $k$ in $K$; e.g., the previous example is a regular function field over $\mathbb{C}$.
Regularity is what one needs to think about function fields as geometric objects: namely, there is a bijective correspondence between regular function fields $K/k$ and complete, nonsingular algebraic curves $X_{/k}$.
Now, on to covers. Let $L/K$ be a finite degree extension of function fields over $k$. One says (often; this is slightly less standard terminology) that the exension $L/K$ is geometric over $k$ if both $L$ and $K$ are regular function fields. And again, there is a bijective correspondence between geometric extensions of function fields and finite $k$-rational morphisms of algebraic curves $Y \rightarrow X$.
Assuming that the bottom function field $K$ is regular, every extension $L/K$ may be decomposed into a tower of a constant extension $lK/K$ followed by a geometric extension $L/lK$. Constant extensions have a role to play in the theory -- see for instance the chapter on constant extensions in Rosen's Number theory in function fields, but I think it is fair to describe their role as algebraic rather than geometric: at least that's the standard view.
In fact, the issue that not all extensions of regular function fields are geometric is an important technical one in the subject, because sometimes natural algebraic constructions do not preserve the class of geometric extensions.
Here is an example very close to my own heart: let $p$ be an odd prime. The elliptic modular curves $X(1)$ and $X_0(p)$ have canonical models over $\mathbb{Q}$ and there is a natural "forgetful modular" covering $X_0(p) \rightarrow X(1)$. This corresponds to a geometric extension of function fields $\mathbb{Q}(X_0(p)) / \mathbb{Q}(X(1))$. This is not a Galois extension: what is the Galois closure and what is its Galois group? If -- as was classically the case -- our constant field were $\mathbb{C}$ -- then the Galois closure is the function field of the modular curve $X(p)$ and the Galois group of the covering $X(p)/X(1)$ is
$\operatorname{PSL}_2(\mathbb{Z}/p\mathbb{Z})$. However, over $\mathbb{Q}$ the Galois closure also contains the quadratic field $\mathbb{Q}\left(\sqrt{(-1)^{\frac{p-1}{2}} p}\right)$ so is an extension of a cyclic group of order $2$ by $\operatorname{PSL}_2(\mathbb{Z}/p\mathbb{Z})$ (in fact it is $\operatorname{PGL}_2(\mathbb{Z}/p\mathbb{Z})$). Thus the extension is not geometric. This is unfortunate, because Hilbert's Irreducibility Theorem says that if one has a geometric Galois extension $L/k(t)$ with $k$ a number field, then one can realize $\operatorname{Aut}(L/k(t))$ as a Galois group over $k$. So in this case, this obtains $\operatorname{PSL}_2(\mathbb{Z}/p\mathbb{Z})$ as a Galois group over not $\mathbb{Q}$ but over the variable quadratic field given above. K.-y. Shih found a brilliant way to "tweak" this construction to realize $\operatorname{PSL}_2(\mathbb{Z}/p\mathbb{Z})$ over $\mathbb{Q}$ in certain (infinitely many) cases, and other mathematicians -- e.g. Serre, myself, my graduate student Jim Stankewicz -- have put a lot of thought into extending Shih's work, but with only very limited success.
Added: Brian's example in the comments is very nice. Maybe another remark to make is that in the arithmetic theory of coverings of curves (an active branch of arithmetic geometry) the distinction between a Galois extension and a geometrically Galois extension of fields (i.e., one which becomes Galois after base change to $\overline{k}$) is a key one: it's certainly something that many arithmetic geometer think a lot about. It just doesn't come with an obvious "visualization", at least not to me. Not everything in algebraic or arithmetic geometry can be visualized, or at least not visualized in a way common to different workers in the field. For instance, an inseparable field extension $l/k$ is by definition ramified, but I have never seen anyone describe this visually. (There are things you can say to justify that this is not a "covering map", e.g. by pointing to the nonreducedness of $l \otimes_k l$, but I don't think this is direct visualization either. Maybe some would disagree?) What you do is think of the case of a ramified cover of Riemann surfaces, and take away the (key) piece of intuition that an inseparable field extension -- which is, visually speaking, just one closed point mapping to another -- behaves like a ramified cover of Riemann surfaces in many ways. So, as Brian says, in this subject a lot of geometric reasoning proceeds by analogy. Unlike in, say, certain branches of low-dimensional topology, one does not prove a theorem by referring to (allegedly) visually apparent features of one's constructions.
[1]: Those who know me well know that I certainly don't think that a curve is weird just because it has no degree one closed points. More accurate is to say that this curve doesn't have any degree one closed points for a "weird reason".
Caveat: in order to give you an overview, I've been vague/sloppy in several places.
Well the basic link to representation theory is that modular forms (and automorphic forms) can be viewed as functions in representation spaces of reductive groups. What I mean is the following: take for example a modular form, i.e. a function $f$ on the upper-half plane satisfying certain conditions. Since the upper-half plane is a quotient of $G=\mathrm{GL}(2,\mathbf{R})$, you can pull $f$ back to a function on $G$ (technically you massage it a bit, but this is the main idea) which will be invariant under a discrete subgroup $\Gamma$. Functions that look like this are called automorphic forms on $G$. The space all automorphic forms on $G$ is a representation of $G$ (via the right regular represenation, i.e. $(gf)(x)=f(xg)$). Basically, any irreducible subrepresentation of the space of automorphic forms is what is called an automorphic representation of $G$. So, modular forms can be viewed as certain vectors in certain (generally infinite-dimensional) representations of $G$. In this context, one can define the Hecke algebra of $G$ as the complex-valued $C^\infty$ functions on $G$ with compact support viewed as a ring under convolution. This is a substitute for the group ring that occurs in the representation theory of finite groups, i.e. the (possibly infinite-dimensional) group representations of $G$ should correspond to the (possibly infinite-dimensional) algebra representations of its Hecke algebra. This type of stuff is the basic connection of modular forms to representation theory and it goes back at least to Gelfand–Graev–Piatestkii-Shapiro's Representation theory and automorphic functions. You can replace $G$ with a general reductive group.
To get to more advanced stuff, you need to start viewing modular forms not just as functions on $\mathrm{GL}(2,\mathbf{R})$ but rather on $\mathrm{GL}(2,\mathbf{A})$, where $\mathbf{A}$ are the adeles of $\mathbf{Q}$. This is a "restricted direct product" of $\mathrm{GL}(2,\mathbf{R})$ and $\mathrm{GL}(2,\mathbf{Q}_p)$ for all primes $p$. Again you can define a Hecke algebra. It will break up into a "restricted tensor product" of the local Hecke algebras as $H=\otimes_v^\prime H_v$ where $v$ runs over all primes $p$ and $\infty$ ($\infty$ is the infinite prime and corresponds to $\mathbf{R}$). For a prime $p$, $H_p$ is the space of locally constant compact support complex-valued functions on the double-coset space $K\backslash\mathrm{GL}(2,\mathbf{Q}_p)/K$ where $K$ is the maximal compact subgroup $\mathrm{GL}(2,\mathbf{Z}_p)$. If you take something like the characteristic function of the double coset $KA_pK$ where $A_p$ is the matrix with $p$ and $1$ down the diagonal, and look at how to acts on a modular form you'll see that this is the Hecke operator $T_p$.
Then there's the connection with number theory. This is mostly encompassed under the phrase "Langlands program" and is a significantly more complicated beast than the above stuff. At least part of this started with Langlands classification of the admissible representation of real reductive groups. He noticed that he could phrase the parametrization of the admissible representations say of $\mathrm{GL}(n,\mathbf{R})$ in a way that made sense for $\mathrm{GL}(n,\mathbf{Q}_p)$. This sets up a (conjectural, though known now for $\mathrm{GL}(n)$) correspondence between admissible representations of $\mathrm{GL}(n,\mathbf{Q}_p)$ and certain $n$-dimensional representations of a group that's related to the absolute Galois group of $\mathbf{Q}_p$ (the Weil–Deligne group). This is called the Local Langlands Correspondence. The Global Langlands Correspondence is that a similar kind of relation holds between automorphic representations of $\mathrm{GL}(n,\mathbf{A})$ and $n$-dimensional representations of some group related to Galois group (the conjectural Langlands group). These correspondences should be nice in that things that happen on one side should correspond to things happening on the other. This fits into another part of the Langlands program which is the functoriality conjectures (really the correspondences are special cases). Basically, if you have two reductive groups $G$ and $H$ and a certain type of map from one to the other, then you should be able to transfer automorphic representations from one to the other. From this view point, the algebraic geometry side of the picture enters simply as the source for proving instances of the Langlands conjectures. Pretty much the only way to take an automorphic representation and prove that it has an associated Galois representation is to construct a geometric object whose cohomology has both an action of the Hecke algebra and the Galois group and decompose it into pieces and pick out the one you want.
As for suggestions on what to read, I found Gelbart's book Automorphic forms on adele groups pretty readable. This will get you through some of what I've written in the first two paragraphs for the group $\mathrm{GL}(2)$. The most comprehensive reference is the Corvallis proceedings available freely at ams.org. To get into the Langlands program there's the book an introduction to the Langlands program (google books) you could look at. It's really a vast subject and I didn't learn from any one or few sources. But hopefully what I've written has helped you out a bit. I think I need to go to bed now. G'night.
Best Answer
Consider what happens if you take a $D$-module on an algebraic curve (with field of fractions $K$) and remove all the information on the singularities. You can achieve this by tensoring over the structure sheaf with $K$, obtaining a module for the ring of differential operators on $K$. This ring is generated over $K$ by differentiation along a single meromorphic vector field (since it's generated by differentiation along all vector fields. So a module over it is just a $K$-vector space with a semilinear action of this differentiation. This will usually be finite-dimensional (I think always for holonomic $D$-modules).
To pass to the Galois theory, we pick a specific vector field and view it as a derivation $D$ on $K$, so we have a finite-dimensional vector space with an action of $D$.
From a finite-dimensional vector space with an action of $D$ one can make a differential field extension using Picard-Vessiot theory. Take a ring generated by independent transcendentals corresponding to basis of this vector space, with the $D$ action given by the $D$ action on the vector space, mod out by a maximal differential ideal, and take the field of fractions.
Any field extension generated by solutions of ODEs arises this way, because we can construct from an order $n$ ODE the vector space generated by a formal solution and its first $n-1$ derivatives and take the corresponding ring, which maps to the field, and the kernel is a differential ideal.
I think this object, a vector space with an action of $D$ is one of the simplest objects one could study in the theory of ODEs, I guess other than an ODE itslf. To some extent, in differential Galois theory and D-module theory, we would take these objects and study them in different ways - in D-modules, one obviously passes from vector spaces to the richer $\mathcal O_X$-modules, which we can study using commutative algebra, and also allow more than one differential operator to act at the same time, creating more interesting algebra, while in differential Galois theory, we pass to studying the differential field extensions and their automorphism groups, often including, as Avi notes, higher degree differential polynomials.
However there is a specific area where they remain close together. When we study $D$-modules whose underlying $\mathcal O_X$-module is locally free, the space of analytic solutions is a representation of $\pi_1(X)$. On the other hand the space of solutions in a differential field large enough to contain all the solutions is a representation of the differential Galois group. These representations can be identified, with the image of $\pi_1$ inside $GL_n$ a subgroup of the differential Galois group - this is simply because analytic continuation around a loop in $X$ always acts as an automorphism of the field of analytic solutions. In good cases (e.g. regular singularities, by the Riemann-Hilbert correspondence), the Zariski closure of $\pi_1$ (the "monodromy group") is precisely equal to the differential Galois group, but not always - as in the case of $e^x$, which has no monodromy but a nontrivial differential Galois group.
So some aspects of the theory of $D$-modules, specifically their comparison to local systems / sheaves and the Riemann-Hilbert correspondence, are closely related to the representation theory of the relevant differential Galois group.