I would suggest William Feller's Introduction to probability theory and its applications. Volume 1 is "basic probability"- but it covers all discrete distributions that you are likely to use, and also treats the Normal distribution with care. Topics like combinatorial probability is done very well. It also has chapters on Markov processes.
Volume 2 is bit advanced- and if memory serves me right, concerned more about continuous probability.
I have spent more time with his volume 1- and everything appearing there is very rigorous. The style of writing and choice of exercises is an art. As far as I remember, there is no (or perhaps very little?) measure theory used in volume 2 as well. In any case, Feller's contents, examples and exercises are so good that I had to recommend this book.
I like the book Probability, Statistics, and Random Processes for Electrical Engineering by Leon-Garcia (I used it to teach probability last semester). Of course, it is Electrical Engineering focused. It supresses the esoteric measure theory material.
The title of your book is funny because my first reaction is "Who but a mathematician would ever need measure theory anyway?" Of course I am being a bit snarky with that comment. I do think it is important for advanced students to at least know what measure theory is about. However, I feel it is more important for students to know the difference between a countably infinite set and an uncountably infinite set. Knowing the difference is essential for measure theory, but unfortunately some courses skip this, assume you already know it, and cover the less essential topics of sigma algebras.
In my humble opinion, you don't really need measure theory for probability or stochastic processes, although measure theory is certainly the foundation for those things. Similarly, Russell's Principia Mathematica is a foundation for basic arithmetic, but most people who use arithmetic (including mathematicians) have never read it (and certainly arithmetic existed before it). So, it is possible for you to learn and use something without going into detail on foundations. It is also good to know those foundations exist, especially if you eventually want answers to lingering questions.
The term "almost surely" is the most important one that you listed. It is synonymous with "with probability 1." For example, if you have a random variable $X$ that is uniformly distributed over the interval $[0,1]$, then $Pr[X \neq 1/2]=1$ and so almost surely $X$ is not $1/2$. Similarly, since the rational numbers in $[0,1]$ can be listed as $\{q_1, q_2, q_3, \ldots\}$ we have:
$$ Pr[\mbox{$X$ is rational}] = \sum_{i=1}^{\infty}Pr[X=q_i]=0 $$
and so $Pr[\mbox{$X$ is irrational}]=1$, which means $X$ is almost surely irrational.
Some sets are so complicated that they cannot have probabilities assigned to them. The term measurable describes a set that has a valid probability. A theorem that says "assuming the set is measurable" is just being precise, and you can ignore that phrase without worry. It just means they are restricting to the case where probabilities are defined (you cannot prove theorems otherwise). All the crazy theorems about measurability are designed to basically show that all practical sets of interest are measurable. In that sense, measure theory is self-defeating, since its most important results ensure you can safely ignore them.
Perhaps the most practical topics in measure theory are the convergence theorems, such as the Lebesgue dominated convergence theorem. These theorems tell you when you are allowed to pass a limit through an integral or through an expectation.
A "sigma algebra" is a class of sets defined so that all sets in the class can be measured. The "standard Borel sigma algebra" is a very large class of sets of real numbers. It is always possible to define the probability that a random variable falls in one of those sets. Since those sets are so extensive, every practical set you will ever work with will indeed be measurable (unless you end up working on foundational mathematical subjects or axiom-of-choice related set theory subjects).
Best Answer
Textbook: Real and Complex Analysis by Walter Rudin
Explanation: Chapters 1, 2, 3, 6, 7 and 8 constitute an excellent general treatment of measure theory. Let me elaborate:
Chapter 1: The notions of an abtract measure space and an abstract topological space are introduced and studied in concurrence. This treatment allows the reader to see the close connections between the two subjects that appear both in practice and in theory. Elementary examples and properties of measurable functions and measures are discussed. Furthermore, Lebesgue's monotone convergence theorem, Fatou's lemma, and Lebesgue's dominated convergence theorem are proven in this chapter. Finally, the chapter discusses consequences of these results. The elegance of the treatment allows the reader to quickly become accustomed to the basic theory of measure.
Chapter 2: This chapter delves further into the intimate connection between topological and measure theoretic notions. More specifically, the chapter begins with a treatment of some important results in general topology such as Urysohn's lemma and the construction of partitions of unity. Afterwards, these results are applied to establish the Riesz representation theorem for positive linear functionals. The proof of this result is long but is nonetheless carefully broken into small steps and the reader should find little or no difficulty in understanding each of these steps. The Riesz representation theorem is applied in a particularly elegant manner to the theory of positive Borel measures. Finally, the existence and basic properties of the Lebesgue measure are shown to be a virtually trivial consequence of the Riesz representation theorem. The chapter ends with a nice set of exercises that discusses, in particular, some interesting counterexamples in measure theory.
Chapter 3: The basic theory of $L^p$ spaces ($1\leq p\leq \infty$) is introduced. The chapter begins with an elementary treatment of convex functions. Rudin explains that many elementary inequalities in analysis may be established as easy consequences of the theory of convex functions and evidence is provided for this claim. In particular, Holder's and Minkowski's inequalities are proven. These results culminate in the proof that the $L^p$ spaces are indeed complex vector spaces. The completeness of the $L^p$ spaces and various important density results are also discussed.
Chapter 6: This chapter discusses the theory of complex measures, and in particular, the Radon-Nikodym theorem. Von Neumann's proof of the Radon-Nikodym theorem is presented and various consequences are discussed ranging from the characterization of the dual of the $L^p$ spaces ($1\leq p\leq \infty$) to the Hahn decomposition theorem. These results culminate in the proof of the Riesz representation theorem for bounded linear functions. A knowledge of chapters $4$ and $5$ are necessary in this chapter although they do not strictly cover measure theory. Uniform integrability and the Vitali convergence theorem are treated in the exercises at the end of the chapter.
Chapter 7: The main topic of this chapter is Fubini's theorem. A wealth of nice counterexamples is discussed and an important application is presented: the result that the convolution of two functions in $L^1$ is again in $L^1$. A wonderful feature of this treatment is the generality; the result is established in one of the most general forms possible.
Chapter 8: This chapter treats differentiation of measures and the Hardy-Littlewood maximal function which is an important tool in analysis. A number of applications are presented ranging from a proof of the change of variables theorem in Euclidean $n$-space (in a very general form) to a treatment of functions of bounded variation and absolute continuity. Several results from this chapter are also used later in this book; most notable is the use of the differentiation theorem of measures in the study of of harmonic functions in chapter 11.
Let me summarize with some general comments regarding the book:
Prerequisites: A good knowledge of set-theoretic notions, continuity and compactness suffice for the chapters that I have described. An at least rudimentary knowledge of differentiation and uniform convergence is very helpful at times. One need not be acquianted with the theory of the Riemann integral beforehand although one should at least be acquianted with its computation. In short, a knowledge of chapters 1, 2, 3, 4 and 7 of Rudin's earlier book Principles of Mathematical Analysis is advisable before one reads this textbook.
Exercises: The exercises in this textbook are wonderful. Many of the exercises build an intuition of the theory and applications treated in the text and therefore it is advisable to do as many exercises as possible. However, you should expect to work to solve a few of the exercises. A number of important concepts such as convergence in measure, uniform integrability, points of density, Minkowski's inequality for convolution, inclusions between $L^p$ spaces, Hardy's inequality etc. are treated in the exercises. However, if you are truly stuck you will find that many of these results are either theorems or exercises with detailed hints in other textbooks. (E.g., Folland's Real Analysis.)
Content: I have already described the content in some detail but let me say that the content is about exactly what one needs to study branches of mathematics where measure theory is applied. Of course, this is with the assumption that one at least attempts as many exercises as possible since a number of important results (from probability theory, for example) are treated in the exercises.
Style: The proofs in Rudin are (with possibly minor exceptions) complete. Unlike a number of other mathematics textbooks, Rudin prefers not to leave any parts of proofs to the reader and instead focusses on giving the reader non-trivial exercises as practice at the end of each chapter. The book reads magnificently and the flow of results is excellent; almost all results are stated in context. It is fair to say that the main text of the book lacks examples, which is perhaps one of the only points of complaints by students, but the exercises do contain examples. Finally, the book is rigorous and is completely free of mathematical errors.
I hope this review of Rudin's Real and Complex Analysis is helpful! I have read virtually the entire book (over $4$ months) and I found it to be one of the most enjoyable experiences of my life. It really motivated me to delve deeper into analysis. Perhaps the same will be true for you. I certainly recommend this book with my deepest enthusiasm.