The Busemann-Petty problem (posed in 1956) has an interesting history. It asks the following question: if $K$ and $L$ are two origin-symmetric convex bodies in $\mathbb{R}^n$ such that the volume of each central hyperplane section of $K$ is less than the volume of the corresponding section of $L$:
$$\operatorname{Vol}_{n-1}(K\cap \xi^\perp)\le \operatorname{Vol}_{n-1}(L\cap \xi^\perp)\qquad\text{for all } \xi\in S^{n-1},$$
does it follow that the volume of $K$ is less than the volume of $L$: $\operatorname{Vol}_n(K)\le \operatorname{Vol}_n(L)?$
Many mathematician's gut reaction to the question is that the answer must be yes and Minkowski's uniqueness theorem provides some mathematical justification for such a belief---Minkwoski's uniqueness theorem implies that an origin-symmetric star body in $\mathbb{R}^n$ is completely determined by the volumes of its central hyperplane sections, so these volumes of central hyperplane sections do contain a vast amount of information about the bodies. It was widely believed that the answer to the Busemann-Problem must be true, even though it was still a largely unopened conjecture.
Nevertheless, in 1975 everyone was caught off-guard when Larman and Rogers produced a counter-example showing that the assertion is false in $n \ge 12$ dimensions. Their counter-example was quite complicated, but in 1986, Keith Ball proved that the maximum hyperplane section of the unit cube is $\sqrt{2}$ regardless of the dimension, and a consequence of this is that the centered unit cube and a centered ball of suitable radius provide a counter-example when $n \ge 10$. Some time later Giannopoulos and Bourgain (independently) gave counter-examples for $n\ge 7$, and then Papadimitrakis and Gardner (independently) gave counter-examples for $n=5,6$.
By 1992 only the three and four dimensional cases of the Busemann-Petty problem remained unsolved, since the problem is trivially true in two dimensions and by that point counter-examples had been found for all $n\ge 5$.
Around this time theory had been developed connecting the problem with the notion of an "intersection body". Lutwak proved that if the body with smaller sections is an intersection body then the conclusion of the Busemann-Petty problem follows. Later work by Grinberg, Rivin, Gardner, and Zhang strengthened the connection and established that the Busemann-Petty problem has an affirmative answer in $\mathbb{R}^n$ iff every origin-symmetric convex body in $\mathbb{R}^n$ is an intersection body. But the question of whether a body is an intersection body is closely related to the positivity of the inverse spherical Radon transform. In 1994, Richard Gardner used geometric methods to invert the spherical Radon transform in three dimensions in such a way to prove that the problem has an affirmative answer in three dimensions (which was surprising since all of the results up to that point had been negative). Then in 1994, Gaoyong Zhang published a paper (in the Annals of Mathematics) which claimed to prove that the unit cube in $\mathbb{R}^4$ is not an intersection body and as a consequence that the problem has a negative answer in $n=4$.
For three years everyone believed the problem had been solved, but in 1997 Alexander Koldobsky (who was working on completely different problems) provided a new Fourier analytic approach to convex bodies and in particular established a very convenient Fourier analytic characterization of intersection bodies. Using his new characterization he showed that the unit cube in $\mathbb{R}^4$ is an intersection body, contradicting Zhang's earlier claim. It turned out that Zhang's paper was incorrect and this re-opened the Busemann-Petty problem again.
After learning that Koldobsky's results contradicted his claims, Zhang quickly proved that in fact every origin-symmetric convex body in $\mathbb{R}^4$ is an intersection body and hence that the Busemann-Petty problem has an affirmative answer in $\mathbb{R}^4$---the opposite of what he had previously claimed. This later paper was also published in the Annals, and so Zhang may be perhaps the only person to have published in such a prestigious journal both that $P$ and that $\neg P$!
$\newcommand\Q{\mathbf{Q}}$
$\newcommand\Qbar{\overline{\Q}}$
$\newcommand\Gal{\mathrm{Gal}}$
$\newcommand\C{\mathbf{C}}$
$\newcommand\Sym{\mathrm{Sym}}$
$\newcommand\E{\mathcal{E}}$
$\newcommand\Betti{\mathrm{Betti}}$
$\newcommand\Z{\mathbf{Z}}$
$\newcommand\Hom{\mathrm{Hom}}$
$\newcommand\T{\mathbf{T}}$
To answer this question, it might be best to start with
the following:
Q. What do the Galois representations attached to a variety know about the variety?
In order make this more precise, let us introduce some notation. Fix a prime $p$ and a non-negative
integer $n$.
Let $X$ be a proper smooth scheme over $\Q$, and
let $V = H^n_{et}(X/\Qbar,\Q_p)$ denote the $n$th etale cohomology group
of $X$. The basic and fundamental properties of etale cohomology tell us that:
$V$ is a vector space of dimension $H^n_{\Betti}(X(\C))$, where $H_{\Betti}$
denotes Betti (or singular) cohomology, and $X(\C)$ denotes the complex points of $X$ thought
of as a topological manifold.
$V$ (with the $p$-adic topology) has a continuous action of $G_{\Q}:=\Gal(\Qbar/\Q)$.
Grothendieck and Serre further conjecture that the $G_{\Q}$-representation $V$
is semi-simple.
The strongest possible conjecture one might make is to ask whether the functor from smooth projective varieties over $\Q$ to
semi-simple $G_{\Q}$-representations (or the collection of all such representations for $n \le 2 \cdot \mathrm{dim}(X)$)
is fully faithful. However, this is too much to ask, for the following reasons.
(i). The target category is semi-simple, but the category of varieties is far from
semi-simple. (In particular, the existence of a map
$X \rightarrow Y$ does not imply the existence of a non-trivial map $Y \rightarrow X$.)
(ii). Varieties built in a combinatorial way from projective spaces (think toric
varieties) tend to have etale cohomology groups indistinguishable from
products of projective spaces. This is because their cohomology groups are generated by geometric cycles, on which Galois
acts in a well understood way (essentially by some power of the cyclotomic character).
These are - in some sense - manifestations of the same reason: A correspondence
in $X \times Y$ gives rise to a cohomology class in $H^*(X \times Y)$; then by
the Künneth formula, this leads to a relation between the cohomology
of $X$ and $Y$ even when there is not necessarily any non-trivial
map from $X$ to $Y$ (or vice versa).
In order to account for this, one can try to take the quotient category of the
category of algebraic varieties in which one is allowed to "break up" smooth
proper varieties into pieces given the existence of certain correspondences on $X$. There are a variety of ways in which one might do this. Conjecturally, these constructions are all essentially the same, and the corresponding category is the category of pure motives. The Tate conjecture now says that etale cohomology is a fully faithful functor from pure motives to semi-simple $G_{\Q}$-representations.
Example If $E$ is an elliptic curve over $\Q$, and $n = 1$, then
the etale cohomology group $V$ is the (dual) of the usual representation
attached to the $p$-adic Tate module of $E$. Suppose that $E'$ is another
elliptic curve over $\Q$ with first etale cohomology group $V'$.
For curves, the theory of "motives" is essentially the theory of abelian varieties. (More generally, the theory of $H^1$ is essentially the theory of abelian
varieties, since, for any proper variety $X$, there is an isomorphism $H^1(X) \simeq H^1(A(X))$, where $A(X)$ is the Albanese of $X$.)
Tate's conjecture in this case says that
$$\Hom(E,E') \otimes \Q_p \rightarrow \Hom_{G_{\Q}}(V,V')$$
is an isomorphism. This is how you will see the Tate conjecture stated for elliptic
curves, for example, in AOEC. The Tate conjecture for abelian varieties is a theorem of Faltings.
(Suggestion: to understand what the Tate conjecture really is about, and why it is hard, you should really think about the special case of Elliptic curves.)
If we now return to our question, we can (tautologically) say the following: assuming the Tate conjecture, the etale cohomology knows about the motive corresponding to the original variety. What does that really mean? One way of thinking about motives is as a ``universal cohomology theory''. In particular, we can recover from the motive not only the etale cohomology groups, but also the algebraic
de Rham cohomology groups. Recall that de Rham cohomology is another cohomology theory
that gives vector spaces of the "correct" dimension for a smooth proper variety $X/\Q$. The de Rham cohomology groups do not have associated Galois representations, but
they do have a Hodge filtration. Over $\C$, if one takes the associated graded of the Hodge filtration, one recovers the Hodge decomposition:
$$H^n_{dR}(X,\C) = \bigoplus_{p+q=n} H^{p}(\Omega^q_X).$$
The dimensions of the latter space are called the Hodge numbers $h^{pq}$.
So, assuming the Tate conjecture, from $V$ we can recover the underlying
motive, from which we may reconstruct the de Rham cohomology, and then the Hodge numbers. The Tate conjecture seems to be very hard.
However, Grothendieck asked the following: given
$V$, can we directly recover the (algebraic $p$-adic) de Rham cohomology along with its filtration without
first constructing the motive? This was a great question, and the answer (yes!) constitutes one of the major achievements of $p$-adic Hodge Theory. I can do no more than give a cartoon description here. In order to do so, first recall the much more classical story connecting de Rham cohomology to Betti (singular) cohomology. These groups can both naturally be defined as vector spaces over $\Q$ (one has to define de Rham cohomology in the correct way), but the isomorphism relating these spaces comes from integrating forms over cycles. Yet these integrals are typically
transcendental numbers, so to pass from Betti to de Rham cohomology one first has to tensor with a field
bigger than $\Q$ which contains all these periods (usually, one simply tensors with $\C$).
In order to pass from etale cohomology to algebraic de Rham cohomology, one might ask for a period ring in which we can compare both groups. In this refined setting, the period ring should both have a Galois action and a filtration. The most basic verion of a period ring is $B_{HT}$, specifically,
$$B_{HT} := \bigoplus_{\Z} \C_p(n),$$
where $\C_p$ is the completion of $\Qbar_p$, and $\C_p(n)$ is $\C_p$ twisted
(as a local Galois module) by the $n$th power of the cyclotomic character.
The ring
$B_{HT}$ has a natural filtration (indeed, it is even graded). Now we can consider
$$D_{HT}(V) = (V \otimes B_{HT})^{\Gal(\Qbar_p/\Q_p)}.$$
The Galois group acts on both $V$ and $B_{HT}$. The result is a graded
(and so filtered) module. On the other hand, one can also consider the ring
$B_{HT} \otimes H^n_{dR}(X/\Q_p)$, where there is a natural way to make sense of the
corresponding filtration. An important theorem of Faltings then says that
$$H^{n}(X/\Qbar_p,\Q_p) \otimes B_{HT} = H^n_{dR}(X/\Q_p) \otimes B_{HT},$$
and $D_{HT}(V) = H^n_{dR}(X/\Q_p)$. In particular, from a geometric Galois
representation, we can recover the Hodge filtration and the Hodge numbers.
Modular Forms. The Eichler-Shimura isomophism relates modular forms of weight $k \ge 2$
to $H^1(X_0(N),\Sym^{k-2}\Q)$. If $k = 2$, this is just $H^1(X_0(N),\Q)$. The Hecke algebra
$\T$ acts on $H^1_{\Betti}(X_0(N),\Q)$, and (since it is constructed functorially) also on the etale cohomology
$H^1(X_0(N),\Q_p)$. Now the Hodge decomposition of $H^1$ is $H^1 = H^{0,1} \oplus H^{1,0}$, where
$h^{0,1} = h^{1,0}$ is the genus of $X_0(N)$.
The Hecke algebra breaks up the cohomology into two dimensional pieces corresponding to the Galois representations
associated to eigenforms; it turns out that each two dimensional piece contains one dimension from
$H^{0,1}$ and one dimension from $H^{1,0}$. The result of Faltings above tells us that we can read off
that $h^{0,1} = h^{1,0} = 1$ directly from the Galois representation.
For $k > 2$, recall that (technical issues aside) there is a universal elliptic curve $\E \rightarrow X_0(N)$.
The Kuga-Sato variety is (again, roughly) The $k-1$ dimensional variety $K = \E \times_X \E \ldots \times_X \E$ where
$X = X_0(N)$. There is a natural map $\pi: K \rightarrow X$.
The local system $\Sym^{k-2}(\Q^2_p)$ is trivialized over $K$, and so, using the proper base change theorem,
Deligne shows that $H^1(X_0(N),\Sym^{k-2}\Q_p)$ is a sub-quotient of the cohomology group $H^{k-1}(K,\Q_p)$.
(Warning: this requires more than simply a formal cohomological argument, it also requires some trickiness with weights to show
that terms on different diagonals the Leray spectral sequence don't "mix", and hence the sequence degenerates.) The Galois representation associated to a modular form
is now a two-dimensional piece of $H^{k-1}(K,\Q_p)$. Faltings proves that the corresponding "piece" of de Rham cohomology
seen by this representation is $H^{0,k-1} \oplus H^{k-1,0}$. In particular, the representation has Hodge numbers
$h^{0,k-1} = 1$ and $h^{k-1,0} = 1$.
Given a Galois representation $V$, one can twist $V$ by the cyclotomic character. How does this effect the Hodge decomposition?
One can compute this on the Hodge side by seeing what happens to the cohomology of $X \times \mathbf{G}^1_m$ and comparing with
the Künneth formula. It turns out that $h^{p,q}(V(n)) = h^{p-n,q-n}$. Thus, if only know $V$ up to twist, we still recover some information
about the Hodge numbers.
Returning to modular forms. The coefficients $a_p$ determine the Galois representation, by Cebotarev.
A modular form of weight $k$ has Hodge numbers $h^{0,k-1} = h^{k-1,0} = 1$. The determinant of the representation is the $k-1$th power
of the cyclotomic character (up to a finite character) which can be read off from the "degree". By twisting, we can easily change the determinant,
and change the Hodge numbers to $h^{-d,k-d-1} = h^{k-d-1,-d} = 1$. Yet,
it is clear that we cannot twist so that $h^{1,0} = h^{0,1} = 1$ unless $k = 2$. Thus, given a modular form of weight $k > 2$, it cannot be associated to
an elliptic curve even after twisting. This is Kevin's answer.
Secondly, any motive has (conjecturally) an $L$-function. The recipe of building this $L$-function breaks up into two parts. The first involves the factors
at finite primes, which give rise to the Euler product. The second involves the infinite primes, which give rise to Gamma factors. The information
at $\infty$, however, (by Tate's conjecture) can be read off from the Galois representation, and the recipe of Deligne shows that it will exactly depend
on the Hodge numbers of the motive, and visa versa. Moreover, twisting by $\epsilon^k$ some power of the cyclotomic factor has the effect of replacing $L(s)$ by $L(s+k)$ (and shifting the corresponding central value) In particular, given an elliptic curve, one knows the Gamma factors (because one knows the Hodge decomposition of $E$), and one sees that even after twisting one cannot get Gamma factors that
"look like" the Gamma factors associated to a modular form of weight different from $2$. This is GH's answer.
More generally, arithmetic conjectures of Langlands type imply that all motives should be "automorphic", and that the Hodge structure of the motive determines the infinity type of the automorphic form,
which in turn determines the Gamma factors. So, at least morally, given a pure irreducible motive $V$, we know that if it is automorphic, it must be automorphic of a particular weight determined by the underlying geometry of $V$.
Of course, even before the result of Faltings, one had enough faith in terms of how these things were connected to be very confident that Elliptic curves over $\Q$ should
correspond exactly to weight two forms - GH's remark that "It was an experimental fact that the gamma factors are always the same, hence the precise form of the modularity conjecture was formulated, which then turned out to be right, namely it was proved by great efforts of great mathematicians" seems spot on.
Related problems.
Given a modular eigenform $f = \sum a_n q^n$ of weight four (in the arithmetic normalization), one can ask: is it possible to show that there does not exist a weight
two modular form
$g = \sum b_n q^n$ where $a_p = p b_p$ for all primes $p$ without using $p$-adic Hodge theory? I think this is not so easy. For example:
(i) The arithmetic approach: The weight $4$ form $f$ would have the property that it is not ordinary at every prime, since clearly
$a_p = p b_p \equiv 0 \mod p$. One conjectures that a set of primes of density one are modular (or $1/2$ if $f$ has CM). Yet it is still unknown whether
any form of weight $\ge 4$ has even a single ordinary prime.
(ii) The analytic approach: What do the distributions of coefficients of weight $2$ and $4$ forms look like? Sato-Tate says that the normalized coefficients satisfy a precise
distribution (now a theorem!). Yet the "normalized" coefficients of $f$ and $g$ are by construction exactly the same, so Sato-Tate says nothing. In particular,
it is hard to see any analytic estimates of functions involving the $a_p$ being able to distinguish two classes of numbers with the same
underlying distribution. A related argument: The Hasse
bound in weight four is satisfied by $p b_p$ if and only if the weight two Hasse bound is satisfied by $b_p$.
Summary. Conjectures arise organically from heuristics and computation. I was "known" that Elliptic curves should be associated to weight two forms long before
one could actually formally prove that they weren't associated to twists of weight $4$ forms. To prove the latter fact, one has to use $p$-adic Hodge theory.
(* I am not sure about a lot here, so I'm putting this community wiki. Also, there is no mention of weight 1 or half integers. Change whatever needs changing, or in the extreme cases, peacefully leave a comment to delete...)
Best Answer
No there are not any mistakes in these papers of any interest. In the 1990s there were a bazillion study groups and seminars across the world devoted to these papers; I personally read all three of the papers you cite, back in the days when I was young and an expert in this area, and they all looked fine to me, and they all looked fine to all the people who were at the IAS with me in 1995 reading them including a whole bunch of people who were a whole lot smarter than me.
As has been pointed out the proof of FLT relies on a whole lot more stuff than just those papers, for example Langlands--Tunnell (which I have not read, and suspect I will never read, but which has been generalised out the park by other authors) and Mazur (which I read through once but which others have read through many many times; it's the kind of paper that some people get addicted to and spend many years devoted to). The full Wiles paper uses Deligne's construction of Galois representations associated to higher weight modular forms, because it proves more than FLT (e.g. it proves R=T for some Hida families) and I've also not read Deligne's construction, but I know people who I trust and who have (e.g. Brian Conrad).
Some comments which might be of interest to you:
I was a post-doc in Berkeley in the mid-90s and during that time I read Ribet's paper; occasionally I would find stuff which I couldn't quite follow, so I would knock on Ken's door and ask him about it, and together we would figure out what he meant. A mathematician would not call these mistakes; one could argue that sometimes there were explanations omitted which you have to be an expert to reconstruct, but I think that this is true of many many papers. In particular a mathematician would not call these "mistakes".
Wiles uses Gross' results on companion forms, which at the time were not unconditionally proved; Gross' arguments assumed that two "canonically"-defined Hecke operators acting on "canonically" isomorphic cohomology groups coincided; at the time nobody had any doubt that this result was correct, but there was no published proof in the literature, and indeed it looked at the time that it might be hard to check. By 1995 Taylor had discovered a workaround which avoided Gross' work so the experts knew that there was not a problem here. The fact that the Hecke operators did actually coincide was ultimately verified by a student of Conrad in the early 2000s.
Wiles uses etale cohomology in some places, and at the time there was a lot of noise generated by logicians about whether this meant that he had assumed Grothendieck's universe axiom, which was known to be something which one could not prove within ZFC (indeed it was known to the logicians that in theory ZFC could be consistent but that ZFC+Grothendieck's universe axiom could be inconsistent). However Deligne's SGA4.5 had, years before the FLT proof, shown that the theory of etale cohomology could be developed within ZFC.