Understanding measurable functions and their definition based on pre-images

lebesgue-measuremeasure-theory

I have recently started learning a bit more about measure theory, but I've been stuck on the definition of measurable functions. I'm comfortable with the formal definition that says a function $f:X\to Y$ is measurable if the pre-image of any measurable set is measurable. What I don't understand is why this definition has been chosen, i.e. the "intuition" on the meaning of being measurable.

I haven't learned about $\sigma$-algebras due to the book I am using, but I'm aware that measurable functions preserve the structure of the measure spaces. In that case, I'm would like to know why pre-images do the trick and not the images of functions. If I wanted to know if $f$ preserved the structure, then my first idea would be to make sure that measurable sets are mapped into/to measurable sets, not to look at pre-images.

Continuity has almost the same definition. However, this comes from the generalization of the $\epsilon$$\delta$ definition of continuity from analysis/metric spaces. Therefore, I don't think the same rational can be used to explain why we use pre-images to define measurable functions.

I have read through a fair amount of StackExchange answers on the topic, and some responses clarified why this definition is useful. For one, if $Y$ does not have a measure an $X$ has $\mu$, then we can pull-back to get $\mu\circ f$. However, this issue doesn't arise when both spaces are measurable. The second thread that helped explained that being measurable is necessary for the Lebesgue integral.

Taken together is that all there is to it? Is this defined so that we can pull-back real functions to properly define Lebesgue integration? Any sort of insights or alternative perspectives would be welcome.

Best Answer

The best intuition might come from the applications of measure theory to probability. In probability theory, you take a measure space $(\Omega, \mathcal{A}, P)$ such that $P(\Omega) = 1$. You can think of $\Omega$ as the set of all possible worlds. $P$ is a probability measure that specifies the probability of any measurable subset of possible worlds.

A random variable is then defined as a measurable function $X : \Omega \rightarrow \mathbb{R}$. That is: as an argument, it takes whatever possible world is the case, and tells us one number about the world.

For simplicity, think of it as a coin-flip. So, there's some set of possible worlds $A \in \mathcal{A}$ such that $X(\omega) = 1$ for all $\omega \in A$; this is all the possible worlds where the coin lands heads. Then $A^c$ is the set of all possible worlds where the coin lands tails.

Now, we want to talk about the probability this coin lands heads. However, in our construction of probability, we only really have a probability measure on $\Omega$. How do we state the probability that the coin landed heads? We look at $P X^{-1}(A)$.

This is why you'd want the inverse images to be measurable: you want to define probability distributions of random variables, and you do so based on the probability measure on this underlying probability space $\Omega$.

Hopefully that provides some intuition!

Related Question