Probability Measure – Simple Explanation and Definition

information theorymeasure-theoryprobability

Can someone explain probability measure in simple words? This term has been hunting me for my life.

Today I came across Kullback-Leibler divergence. The KL divergence between probability measure P and Q is defined by,

$$KL(P,Q)= \begin{cases}
\int \log\left(\frac{dP} {dQ}\right)dP & \text{if}\ P\ll Q, \\
\infty & \text{otherwise}.
\end{cases}$$

I have no idea what I just read. I looked up probability measure, it refers to probability space. I looked that up, it refers to $\sigma$-algebra. I told myself I have to stop.

So, is a probability measure just a probability density but a broader and fancier saying? Am I overlooking a simple concept, or is this topic just that hard?

Best Answer

A probability space consists of:

A sample space $X$, which is the set of all possible outcomes of an experiment
A collection of events $\Sigma$, which are subsets of $X$
A function $\mu$, called a probability measure, that assigns to each event in $\Sigma$ a nonnegative real number

Let's consider the simple example of flipping a coin. In that case, we have $X=\{H,T\}$ for heads and tails respectively, $\Sigma=\{\varnothing,\{H\},\{T\},X\}$, and $\mu(\varnothing)=0$, $\mu(\{H\})=\mu(\{T\})=\frac{1}{2},$ and $\mu(X)=1$. All of this is a fancy way of saying that when I flip a coin, I have a $0$ percent chance of flipping nothing, a $50$ percent chance of flipping heads, a $50$ percent chance of flipping tails, and a $100$ percent chance of flipping something, heads or tails. This is all very intuitive.

Now, getting back to the abstract definition, there are certain natural requirements that $\Sigma$ and $\mu$ must satisfy. For example, it is natural to require that $\varnothing$ and $X$ are elements of $\Sigma$, and that $\mu(\varnothing)=0$ and $\mu(X)=1$. This is just saying that when performing an experiment, the probability that no outcome occurs is $0$, while the probability that some outcome occurs is $1$.

Similarly, it is natural to require that $\Sigma$ is closed under complements, and if $E\in\Sigma$ is an event, then $\mu(E^c)+\mu(E)=1$. This is just saying that when performing an experiment, the probability that event $E$ occurs or doesn't occur must be $1$.

There are other requirements of $\Sigma$ which make it a $\sigma$-algebra, and other requirements of $\mu$ which make it a (finite) measure, and to rigorously study probability, one must eventually become familiar with these notions.

Best Answer

Related Solutions

[Math] Comparing the Kullback-Leibler divergence to the total variation distance on discrete probability densities.

[Math] Density w.r.t. counting measure and probability mass function (discrete rv)

Related Question