It has been my general desire for a few years to acquire the basics in other European languages for the purpose of reading some of the classics in their original language, in a similar vein to this topic. I never pursued a whole lot, and so my knowledge of exactly what those classics might be for a particular language never developed very far. In anticipation for a trip to Lyon this summer I have begun to learn a little French, and would be very interested in reading some of the more palatable (in the sense of a reader who is fairly naive to the language) French texts. My first instincts would be Cauchy and Lebesgue, seeing as I am more analytically inclined, but I have no idea where to start or which of their works are readily available.
[Math] Approachable French Masters
big-listho.history-overviewreading-listtextbook-recommendation
Related Solutions
I agree 100% with Igor and Andrew L., on the benefit of reading the creator's version of the same thing available from later expositors. I have gained mathematical insights from reading Euclid, Archimedes, Riemann, Gauss, Hurwitz, Wirtinger, as well as moderns like Zariski.... on topics I already thought I understood.
Just Euclid's use of the word "measures" for "divides" finally made clear to me the elementary argument that the largest number dividing 2 integers is also the smallest positive number one can measure using both of them. This is clear thinking of (commensurable) measuring sticks, since by translating it is obvious the set of lengths that one can so measure are equally spaced, hence the smallest one would measure them all.
I was unaware also that Euclid's characterization of a tangent line to a circle was not just that it is perpendicular to the radius, but is the only line meeting the circle locally once and such that changing its angle ever so little produces a second intersection, i.e. Newton's definition of a tangent line. It is said Newton read Euclid just before giving his own definition.
I did not realize until reading Archimedes that the "Cavalieri principle" follows just from the definition of the Riemann integral, without needing the fundamental theorem of calculus. I.e. it follows just from the definition of a volume as a limit of approximating slices, and was known to Archimedes. Hence one can conclude all the usual volume formulas for pyramids, cones, spheres, even the bicylinder, just by starting from the decomposition of a cube into three right pyramids, applying Cavalieri to vary the angle of the pyramid, then approximating and using Cavalieri. It is an embarrassment to me that I had thought the volume of a bicylinder a more difficult calculus problem that that for a sphere, when it follows immediately from comparing horizontal slices of a double square based pyramid inscribed in a cube. I.e. by Cavalieri and the Pythagorean theorem, the volume of a sphere is the difference between the volumes of a cylinder and an inscribed double cone. The same argument shows the volume of a bicylinder is the difference between the volumes of a cube and an inscribed double square based pyramid. This led to an intuitive understanding of the simple relation between the volumes of certain inscribed figures that I then noticed had been recently studied by Tom Apostol.
I realized this summer that this allows a computation of the volume of the 4 dimensional ball. I.e. this ball results from revolving half a 3 ball, hence can be calculated by revolving a cylinder and subtracting the volume of revolving a cone. Since Archimedes knew the center of gravity of both those solids he knew this.
Having read everywhere that Hurwitz's theorem was that the maximum number of automorphisms of a Riemann surface of genus $g$ is $84(g-1),$ I had a difficult proof that the maximum number in genus $5$ is $192,$ using Jacobians, Prym varieties, and classifications of representations of planar groups, until Macbeath referred me to Hurwitz' original paper where a complete list of the possible orders was easily given: $84(g-1), 48(g-1),\ldots$ I subsequently explained this easy argument to some famous mathematical figures. Sometime later a more complicated such example for which Macbeath himself was usually credited was found also to occur in the 19th-century literature.
Having studied Riemann surfaces all my life, but unable to read German well, I thought I had acquired some grasp of the Riemann Roch theorem, in particular I thought Riemann had given only an inequality $\ell(D) ≥ 1-g + \deg(D).$ When the translation from Kendrick press became available, I learned he had written down a linear map whose kernel computed $\ell(D),$ and the estimate derived from the fundamental theorem of linear algebra. The full equality also follows, but only if one can compute the cokernel as well. That cokernel of course was already shown by him to be what we now call $H^1(D).$ Hence Riemann's original theorem was the so called "index" version of RR. Since he expressed his map in terms of path integrals, it was natural to evaluate those integrals by residue calculus as Roch did. This is explained in my answer to "why is Riemann Roch [not precisely] an index problem?" Although there are many fine modern expositions of Riemann Roch, the most insightful perhaps being that in the chapter on Riemann surfaces in Griffiths and Harris, I had not seen how simple it was until reading Riemann.
Perhaps this is only historical knowledge, but reading Riemann one sees that he also knew completely how to prove (index) Riemann Roch for algebraic plane curves, without appealing to the questionable Dirichlet principle, hence the usual impression that a rigorous proof had to await later arguments of Clebsch, Hilbert, or Brill and Noether, is incorrect.
Reading Wirtinger’s 19th century paper on theta functions, even though unfortunately for me only available in the original German, I learned that when a smooth Riemann surface acquires a singularity, the elementary holomorphic differential with a non zero period around that vanishing cycle, becomes meromorphic, and that period becomes the residue at the singular point. At last this explains clearly why one defines "dualizing differentials" as one does, in algebraic geometry.
Once as grad student in Auslander's algebraic geometry class, I vowed to try out Abel's advice and read the master Zariski's paper on the concept of a simple point. I was very discouraged when several hours passed and I had managed only a few pages. Upon returning to class, Auslander began to pepper us with questions about regular local rings. I found out how much I had learned when I answered them all easily until he literally told me to be quiet, since I obviously knew the subject cold. (To be honest, I did not know the very next question he posed, but I was off the hook.)
In my answer to a question about where to learn sheaf cohomology I have given an example of insight only contained in Serre's original paper.
The sense of wonder and awe one gets upon reading people like Riemann or Euler, is also quite wonderful. Any student who has struggled to compute the sum of the even powers of the reciprocals of natural numbers $1/n^{2k},$ will be amazed at Euler's facile accomplishment of this for many values of $k.$ Calculus students estimating $\pi$ by the usual series to 3 or 4 places will also be impressed at his scores of correct digits. On the other hand, anyone using a modern computer can detect an actual error in his expansion of $\pi,$ I forget where, in the 214th place? but an error which was already noticed long ago.
As you can see these are elementary examples hence from a fairly naive and uneducated person, myself, who has not at all plumbed the depth of many original papers. But these few forays have definitely convinced me there is a benefit that cannot be gained elsewhere, as these exposures can transform the understanding of ordinary mortals closer to that of more knowledgeable persons, at least in a narrow vein. So while it might be thought that only the strongest mathematicians can attempt these papers, my advice would be that reading such masters may be even more helpful to us average students.
As a remark on criterion 2 of the original question, I find it is not at all necessary to read all of a paper by a master to get some insight. One word in Euclid enlightened me, and before the translation came out, I had already gained most of my understanding of Riemann's argument for RR just from reading the headings of the paragraphs. I learned a proof of RR for plane curves from reading only the introduction to a paper of Fulton. A single sentence of Archimedes, that a sphere is a cone with vertex at the center and base equal to the surface, makes it clear the volume is $1/3$ the surface area. Moreover this shows the same ratio holds for a bicylinder, whereas the area of a bicylinder is considered so difficult we do not even ask it of calculus students. So one should not be discouraged by the difficulty of reading all of a masters' paper, although of course it wouldn't hurt.
A remark on the definition of master, versus creator. There are cases where a later master re - examines an earlier work and adds to it, and in these cases it seems valuable to read both versions. In addition to examples given above of Newton generalizing Euclid and Mumford using Hilbert, perhaps Mumford's demonstration of the power of Grothendieck's Riemann Roch theorem in calculating inavriants of moduli space of curves is relevant.
A related question occurs in many cases since the classical arguments of the "ancients" are preserved but only in classical texts such as Van der Waerden in algebra, and newer books have found slicker methods to avoid them. E.g. the method of LaGrange resolvents is useful in Galois theory for proving an extension of prime degree in characteristic zero is radical. There are faster less precise methods of showing this such as Artin/Dedekind's method of independence of characters, but the older method is useful when trying to use Galois theory to actually write down solution formulas of cubics and quartics. Thus today we often have an intermediate choice of reading modern expositions which reproduce the methods of the creators, or ones that avoid them, sometimes losing information. (This is discussed in the math 844-2 algebra notes on my web page, where, being a novice, I give all competing methods of proof.)
There's a fundamental difficulty with your claim that
a mathematician can't use a term before giving its accurate definition.
Mathematical definitions are always in terms of things that are already understood. One could eliminate the use of the word "set" in developing axiomatic set theory, but you would still need to define (for example) terms such as "axiom." How would you do this? You could define the word "axiom" in terms of arithmetical concepts, but then what is the definition of an integer? Or you could define it in terms of syntactic concepts such as "symbol" and "string," but then what is the definition of a "symbol" or a "string" or a "sequence"?
If you want to do anything at all, then you have to start somewhere and take something for granted, and therefore you cannot take your principle that "a mathematician can't use a term before giving its accurate definition" literally.
Developing axiomatic set theory by using set-theoretic language might, depending on the student, be a pedagogical mistake, but it is not a logical mistake. The word "set" as it is used in the development of the theory is meant to refer to a concept that you already have a clear grasp of. The "sets" that are later introduced axiomatically are distinct from that. This is the distinction between theory and meta-theory.
There is actually an advantage to developing mathematical logic in set-theoretic terms, because it then lets you see that mathematical logic, like all other branches of mathematics, can be formalized using axiomatic set theory.
However, I agree with you that this can be confusing pedagogically. It seems that a lot of people nowadays are comfortable with taking syntactic concepts such as "symbol" and "string" and "sequence" and "rule" for granted, without demanding that these concepts be defined before they are used. Therefore one could ask for a treatment that does not refer to sets at all but that refers purely to syntax. This can still get tricky because at some point you are going to need to use some nontrivial reasoning about what happens when you manipulate strings according to syntactic rules; this will require identifying strings with integers and applying basic number-theoretic results. You might then get demands to define what integers are and questions about how you know what axioms apply to integers before you have fully developed a theory of axiomatic systems. There's no canonical way to address these demands since, as I said, something has to be taken for granted, and so I don't know that anyone has written a textbook quite like what you have in mind.
Having said that, I suggest that you try taking a look at Quine's Mathematical Logic, and in particular his section on "protosyntax." This is an attempt to develop the subject syntactically, which might be what you're looking for.
Note, by the way, that if you take this route, then some of the motivation for set theory is removed, because it is no longer apparent that set theory is really a foundation for everything. Instead, syntax becomes the foundation for everything. One can then ask if our theory of syntax is sound, and the same confusion will arise all over again, but now with "syntax" being the apparently circular concept rather than "set".
Best Answer
Since you mention Lebesgue, I would recommend the following two classics, which build on his lectures at the Collège de France :
Leçons sur l'intégration
Leçons sur les séries trigonométriques
Another suggestions : Topologie générale by Bourbaki, and Théorie des distrbutions by Laurent Schwartz.