I found Teruyoshi Yoshida's exposition of the subject very helpful:
http://www.dpmms.cam.ac.uk/~ty245/Yoshida_2003_introDL.pdf
As JT commented, the curve you wrote down is really the Deligne-Lusztig variety for SL_2, not GL_2. Ben is also right about the curve being $\mathbf{P}^1 - \mathbf{P}^1(\mathbf{F}_q)$, only he is using a different definition of DL variety from you I would presume. The way Ben has it, the DL variety is a subvariety of G/B, but the curve you want is a subvariety of G/U, where U is the unipotent radical. One formulation is a cover of the other with galois group equal to the rational points on a twist of the torus T. We'll take the G/U point of view here.
So let's start with $G=\text{SL}_2$ over the field $\mathbf{F}_q$. We'll let $B$ be the usual Borel and $U$ its unipotent radical. We can then identify $G/B$ with $\mathbf{P}^1$ and $G/U$ with $\mathbf{A}^2$. The latter identification sends $(a,b,c,d)$ to $(a,c)$.
Let $w=(0,1,1,0)$ be the nontrivial Weyl element. We let $X_w$ be the subvariety of $G/B$ consisting of elements $x$ for which $x$ and $F(x)$ are in relative position $w$, where $F$ is the Frobenius map. This is $\mathbf{P}^1 - \mathbf{P}^1(\mathbf{F}_q)$ as Ben says.
For cosets $x,y\in G/B$ in relative position $w$, and a coset $gU\in G/U$ for which $gB=x$, we are going to define a new coset $w_{x,y}(gU)\in G/U$ as follows. First find a $g'\in G$ for which $g'B=x$ and $g'wB = y$. We may further take $g'$ so that $g'U=gU$. (This can be done because of the Bruhat decomposition of $G/B \times G/B$--wait a moment to see how this plays out for $\text{SL}_2$.) Then define $w_{x,y}(gU) = g'wU$. (Pardon the abuse of notation of the symbol $w$.) The Deligne-Lusztig variety $Y_w$ is defined as the set of $gU\in G/U$ for which $F(gU)=w_{gB,F(gB)}(gU)$.
When does a point $(x,y)\in\mathbf{A}^2=G/U$ lie in $Y_w$? We need to calculate $w_{gB,F(gB)}(gU)$, where $g=(x,*,y,*)\in G$. We have $gB=g\cdot\infty=x/y$ and $F(gB)=(x/y)^q$. So we must now find $g'\in G$ with $g'U=gU$ and $g'wB=F(g)wB$. The first condition means that $g'=(x,*,y,*)$ and the second means that $g'\cdot 0=(x/y)^q$. Thus $g'=(x,ux^q,y,uy^q)$, where $u$ must satisfy $u(xy^q-x^qy)=1$. We find that $w_{gB,F(gB)}(gU)=g'wU=(ux^q,uy^q)$. The condition that $(x,y)\in Y_w$ is exactly that $(x^q,y^q)=(ux^q,uy^q)$, which implies that $u^{-1}=x^qy-xy^q=1$. So that's the equation for the Deligne-Lusztig variety.
The equation for the DL variety for the longest cyclic permutation in the Weyl group of $\text{SL}_n$ is $\det(x_i^{q^j})=1$, where $0\leq i,j\leq n-1$.
I believe Lusztig calculated the zeta functions of his varieties in a very general setting, but I was never able to trudge through it all. There must be a simple answer for the behavior of the zeta functions for the $\text{GL}_n$ varieties--if you ever write it up I'd certainly love to read it! I can start you off: for $\text{SL}_2$ over $\mathbf{F}_q$, the DL curve has a compactly supported $H^1$ of dimension $q(q-1)$, and the $q^2$-power Frobenius acts as the constant $-q$. (The behavior of the $q$-power Frobenius might be a little subtle--I suspect it has to do with Gauss sums.)
Good luck!
Regarding Shimura varieties:
One has to first consider the case of modular curves, which has served throughout as an impetus and inspiration for the general theory.
The study of modular curves (in various guises) goes back to the 19th century, with the work
of Jacobi and others on modular equations (which from a modern viewpoint are explicit equations for the modular curves $X_0(N)$). The fact that these curves are defined over $\mathbb Q$ (or even $\mathbb Z$) also goes back (in some form) to the 19th century, in so far as it was noticed that modular equations have rational or integral coefficients. There is also the (strongly related) fact that interesting modular functions/forms have rational or integral $q$-expansion coefficients. Finally, there are the facts related to Kronecker's Jugendtraum, that modular functions/forms with integral Fourier coefficients, when evaluated at quadratic imaginary points in the upper half-plane, give algebraic numbers lying in abelian extensions of quadratic imaginary fields. These all go back to the 19th century in various forms, although complete theories/interpretations/explanations weren't known until well into the 20th century.
The idea that the cohomology of modular curves would be Galois theoretically interesting is more recent. I think that it goes back to Eichler, with Igusa, Ihara, Shimura, Serre, and then Deligne all playing important roles. It seems to be non-trivial to trace the history, in part because the intuitive idea seems to predate the formal introduction of etale cohomology (which is necessary to make the idea completely precise and general). Thus Ihara's work considers zeta-functions of modular curves (or of the Kuga--Satake varieties over them) rather than cohomology. (The zeta-function is a way of incarnating the information carried in cohomology without talking directly about cohomology). Shimura worked just with weight two modular forms (related to cohomology with constant coefficients), and instead of talking directly about etale cohomology worked with the Jacobians of the modular curves. (He explained how the Hecke operators break up the
Jacobian into a product of abelian varieties attached to Hecke eigenforms.) [Added: In fact,
I should add that Shimura also had an argument, via congruences, which reduced the
study of cohomology attached to higher weight forms to the case of weight two forms; this was elaborated on by Ohta. These kinds of arguments were then rediscovered and further developed by Hida, and have since been used by lots of people to relate modular forms of
different weights to one another.]
The basic idea, which must have been understood in some form by all these people, is
that a given Hecke eigenform $f$ contributes two dimensions to cohomology, represented by the two differential forms $f d\tau$ and $\overline{f}d\tau$. Thus Hecke eigenspaces in
cohomology of modular curves are two-dimensional. Since the Hecke operators are defined over $\mathbb Q$, these eigenspaces are preserved by the Galois action on etale cohomology, and so we get two-dimensional Galois reps. attached to modular forms.
As far as I understand, Shimura's introduction of general Shimura varieties grew out of
thinking about the theory of modular curves, and in particular, the way in which that
theory interacted with the theory of complex multiplication elliptic curves. In particular,
he and Taniyama developed the general theory of CM abelian varieties, and it was natural to try to embed that more general theory into a theory of moduli spaces generalizing the modular curves. A particular challenge was to try to give a sense to the idea that the
resulting varieties (i.e. Shimura varieties in modern terminology) had canonical models over number fields. This could no longer be done by studying rationality of $q$-expansions (since they could be compact, say, and hence have no cusps around which to form Fourier
expansions). Shimura introduced the Shimura reciprocity law, i.e. the description of the
Galois action on the special points (the points corresponding to CM abelian varieties) as the basic tool for characterizing and studying rationality questions for Shimura varieties.
In particular, Shimura varieties were introduced prior to the development of the Langlands programme, and for reasons other than the construction of Galois representations. However,
once one had these varieties, naturally defined over number fields, and having their origins in the theory of algebraic groups and automorphic forms, it was natural to try to calculate their zeta-functions, or more generally, to calculate the Galois action on their cohomology, and Langlands turned to this problem in the early 1970s. (Incidentally, my understanding is that it was he who introduced the terminology Shimura varieties.) The first question he tried to answer was: how many dimensions does a given Hecke eigenspace contribute to the cohomology.
He realized that the answer to this --- at least typically --- was given by Harish-Chandra's theory of (what are now called) discrete series $L$-packets, as is explained in his
letters to Lang; the relationship of the resulting Galois representations to the Langlands program is not obvious --- in particular, it is not obvious how the dual group intervenes --- and this (namely, the intervention of the dual group) is the main topic of the letters to Lang. These letters to Lang are just the beginning of the story, of course. (For example, the typical situation does not always occur; there is the phenomenon of endoscopy. And then there is the problem of actually proving that the Galois action on cohomology gives what one expects it to!)
Regarding Drinfeld and Deligne--Lusztig varieties:
I've studied these cases in much less detail, but I think
that Drinfeld was inspired by the case of Shimura varieties, and (as Jim Humphreys has noted) Deligne--Lusztig drew insipration from Drinfeld.
What can one conclude:
These theoretically intricate objects grew out of a long and involved history, with multiple motivations driving their creation and the investigations of their properties.
If you want to find a unifying (not necessarily historical) theme,
one could also note that Deligne--Lusztig varieties are built out of flag varieties in a certain sense, in fact as locally closed regions of flag varieties, and that Shimura varieties are also built out of (in the sense that they are
quotients of) symmetric spaces, which are again open regions in (partial) flag varieties.
This suggests a well-known conclusion, namely that the geometry of reductive groups and the various spaces associated to them seems to be very rich.
Best Answer
It's not easy to explain the motivation without being one of the authors, but in fact Lusztig has provided some helpful perspective on the writing of his joint paper with Deligne (1976) and his earlier related paper (1974) in Ann. of Math. Studies 81. On his homepage at MIT you can find an intimidating list of all his papers here, along with detailed comments on some of them here. See in particular numbers 17 and 22. Even though his comments are fairly short, they do bring out the transition from the earlier ideas of Macdonald and Springer to the specific construction of Deligne-Lusztig varieties. Some of the personal contacts and influences are impossible to trace, but a basic motivation was the construction of explicit representations of the finite groups of Lie type which would realize the elusive "cuspidal" or "discrete series" characters. In his 1955 paper on finite general linear groups, Green was able to deal with the characters inductively in a combinatorial spirit, but for other Lie types the story gets more complicated and requires a more sophisticated approach.
There were of course some reviews of the two papers I've mentioned, along with a nice technical survey by Serre in the 1975-76 Bourbaki seminar. But it's hard to extract from the literature as much insight as you can get from Lusztig's own comments. In particular, I think he makes it clear that there was no single moment of illumination based on the rank 1 case, but rather a coming together of a number of ways of thought that had already become influential in algebraic geometry and representation theory (illustrated by Springer's work on representations of Weyl groups in the early 1970s). Lusztig himself started out in algebraic topology but his collaboration with Roger Carter in Warwick got him involved in some of the problems of representation theory for algebraic groups and finite groups of Lie type. Having said all this, it must be added that it takes some rather brilliant people to come up with the right approach to such a stubborn problem in finite group theory.