Textbook: Real and Complex Analysis by Walter Rudin
Explanation: Chapters 1, 2, 3, 6, 7 and 8 constitute an excellent general treatment of measure theory. Let me elaborate:
Chapter 1: The notions of an abtract measure space and an abstract topological space are introduced and studied in concurrence. This treatment allows the reader to see the close connections between the two subjects that appear both in practice and in theory. Elementary examples and properties of measurable functions and measures are discussed. Furthermore, Lebesgue's monotone convergence theorem, Fatou's lemma, and Lebesgue's dominated convergence theorem are proven in this chapter. Finally, the chapter discusses consequences of these results. The elegance of the treatment allows the reader to quickly become accustomed to the basic theory of measure.
Chapter 2: This chapter delves further into the intimate connection between topological and measure theoretic notions. More specifically, the chapter begins with a treatment of some important results in general topology such as Urysohn's lemma and the construction of partitions of unity. Afterwards, these results are applied to establish the Riesz representation theorem for positive linear functionals. The proof of this result is long but is nonetheless carefully broken into small steps and the reader should find little or no difficulty in understanding each of these steps. The Riesz representation theorem is applied in a particularly elegant manner to the theory of positive Borel measures. Finally, the existence and basic properties of the Lebesgue measure are shown to be a virtually trivial consequence of the Riesz representation theorem. The chapter ends with a nice set of exercises that discusses, in particular, some interesting counterexamples in measure theory.
Chapter 3: The basic theory of $L^p$ spaces ($1\leq p\leq \infty$) is introduced. The chapter begins with an elementary treatment of convex functions. Rudin explains that many elementary inequalities in analysis may be established as easy consequences of the theory of convex functions and evidence is provided for this claim. In particular, Holder's and Minkowski's inequalities are proven. These results culminate in the proof that the $L^p$ spaces are indeed complex vector spaces. The completeness of the $L^p$ spaces and various important density results are also discussed.
Chapter 6: This chapter discusses the theory of complex measures, and in particular, the Radon-Nikodym theorem. Von Neumann's proof of the Radon-Nikodym theorem is presented and various consequences are discussed ranging from the characterization of the dual of the $L^p$ spaces ($1\leq p\leq \infty$) to the Hahn decomposition theorem. These results culminate in the proof of the Riesz representation theorem for bounded linear functions. A knowledge of chapters $4$ and $5$ are necessary in this chapter although they do not strictly cover measure theory. Uniform integrability and the Vitali convergence theorem are treated in the exercises at the end of the chapter.
Chapter 7: The main topic of this chapter is Fubini's theorem. A wealth of nice counterexamples is discussed and an important application is presented: the result that the convolution of two functions in $L^1$ is again in $L^1$. A wonderful feature of this treatment is the generality; the result is established in one of the most general forms possible.
Chapter 8: This chapter treats differentiation of measures and the Hardy-Littlewood maximal function which is an important tool in analysis. A number of applications are presented ranging from a proof of the change of variables theorem in Euclidean $n$-space (in a very general form) to a treatment of functions of bounded variation and absolute continuity. Several results from this chapter are also used later in this book; most notable is the use of the differentiation theorem of measures in the study of of harmonic functions in chapter 11.
Let me summarize with some general comments regarding the book:
Prerequisites: A good knowledge of set-theoretic notions, continuity and compactness suffice for the chapters that I have described. An at least rudimentary knowledge of differentiation and uniform convergence is very helpful at times. One need not be acquianted with the theory of the Riemann integral beforehand although one should at least be acquianted with its computation. In short, a knowledge of chapters 1, 2, 3, 4 and 7 of Rudin's earlier book Principles of Mathematical Analysis is advisable before one reads this textbook.
Exercises: The exercises in this textbook are wonderful. Many of the exercises build an intuition of the theory and applications treated in the text and therefore it is advisable to do as many exercises as possible. However, you should expect to work to solve a few of the exercises. A number of important concepts such as convergence in measure, uniform integrability, points of density, Minkowski's inequality for convolution, inclusions between $L^p$ spaces, Hardy's inequality etc. are treated in the exercises. However, if you are truly stuck you will find that many of these results are either theorems or exercises with detailed hints in other textbooks. (E.g., Folland's Real Analysis.)
Content: I have already described the content in some detail but let me say that the content is about exactly what one needs to study branches of mathematics where measure theory is applied. Of course, this is with the assumption that one at least attempts as many exercises as possible since a number of important results (from probability theory, for example) are treated in the exercises.
Style: The proofs in Rudin are (with possibly minor exceptions) complete. Unlike a number of other mathematics textbooks, Rudin prefers not to leave any parts of proofs to the reader and instead focusses on giving the reader non-trivial exercises as practice at the end of each chapter. The book reads magnificently and the flow of results is excellent; almost all results are stated in context. It is fair to say that the main text of the book lacks examples, which is perhaps one of the only points of complaints by students, but the exercises do contain examples. Finally, the book is rigorous and is completely free of mathematical errors.
I hope this review of Rudin's Real and Complex Analysis is helpful! I have read virtually the entire book (over $4$ months) and I found it to be one of the most enjoyable experiences of my life. It really motivated me to delve deeper into analysis. Perhaps the same will be true for you. I certainly recommend this book with my deepest enthusiasm.
First, there are things that are much easier given the abstract formultion of measure theory. For example, let $X,Y$ be independent random variables and let $f:\mathbb{R}\to\mathbb{R}$ be a continuous function. Are $f\circ X$ and $f\circ Y$ independent random variables. The answer is utterly trivial in the measure theoretic formulation of probability, but very hard to express in terms of cumulative distribution functions. Similarly, convergence in distribution is really hard to work with in terms of cumulative distribution functions but easily expressed with measure theory.
Then there are things that one can consume without much understanding, but that requires measure theory to actually understand and to be comfortable with it. It may be easy to get a good intuition for sequences of coin flips, but what about continuous time stochastic processes? How irregular can sample paths be?
Then there are powerful methods that actually require measure theory. One can get a lot from a little measure theory. The Borel-Cantelli lemmas or the Kolmogorov 0-1-law are not hard to prove but hard to even state without measure theory. Yet, they are immensely useful. Some results in probability theory require very deep measure theory. The two-volume book Probability With a View Towards Statistics by Hoffman-Jorgensen contains a lot of very advanced measure theory.
All that being said, there are a lot of statisticians who live happily avoiding any measure theory. There are however no real analysts who can really do without measure theory.
Best Answer
I think the situation is similar to that in algebra. In elementary school, you learned that $1+1=2$. It was kinda obvious, right? In rigorous advanced algebra, however, you first have to define “$1$”, “$2$”, “$+$” and then you must prove that $1+1=2$.
Similarly, probability theory at an undergraduate level uses some informal but intuitively sound notions when introducing the basics and how those foundations are built are largely left unsaid, presumably because the focus at this level ought to be on more interesting topics relying on these basics, such as combinatorics, distribution theory, statistics, practical applications, and so forth.
Only at a more advanced level do you realize that the foundations of probability theory are basically the same as those of measure theory under the special assumption that the measure of the whole space is normalized to one. The constructions and results from measure theory help you build a rigorous and consistent theory about what events and probabilities really are. The point is that at this higher level, there are no loose ends left and informal concepts that you were accustomed to during your undergraduate training (and accepted them without many reservations, since they felt intuitively right) are placed on rock-solid theoretical grounds.