The significance of a Borel $\sigma$-algebra

borel-setsmeasure-theoryprobability theory

I am trying to get a firm understanding on probability theory currently. I understand the definition of a $\sigma$-algebra and further understand that a $\sigma$-algebra is a crucial part of a probability space and that it is necessary to uphold the foundations of probability theory. However, early into my course I am met with the definition of a Borel $\sigma$-algebra:

The Borel $\sigma$-algebra on $\mathbb{R^d}$, denoted by $\mathcal{B}(\mathbb{R^d})$ is the $\sigma$-algebra generated by the collection of open sets of $\mathbb{R^d}$.

What is the signifiance of this Borel sigma-algebra in the grand scheme of probability theory? I do not have a background in topology, so struggle with some of the definitions online. Can anyone offer simple, intuitive reasoning?

Best Answer

The Borel-$\sigma$-algebra is important, because some very natural measures simply cannot be defined on all the subsets of $\mathbb{R}^d$. Let me show you the standard example: take $d=1$. We would like to define a "length"-function, i.e. a function on all the subsets of $\mathbb{R}$ (denoted $\wp(\mathbb{R})$) which to each subset $A\subseteq\mathbb{R}$ assigns the length of $A$ (whatever that means). Call this function $\lambda$. For us to think of $\lambda(A)$ as the length of $A$, $\lambda$ must satisfy the following properties:

  1. $\lambda:\wp(\mathbb{R})\to[0,\infty)\cup\{+\infty\}$ (lengths should be non-negative, but may be infinite),
  2. $\lambda((a,b))=b-a$ for all $a<b$ (we know what the length of an interval should be),
  3. $\lambda(A+x)=\lambda(A)$ for all $A\in\wp(\mathbb{R})$ and $x\in\mathbb{R}$, where $A+x=\{a+x\mid a\in A\}$ (lengths should be translation invariant),
  4. $\lambda(\bigcup_{n\in\mathbb{N}}A_n)=\sum_{n\in\mathbb{N}}\lambda(A_n)$ for any sequence of pairwise disjoint sets $A_1,A_2,A_3,\ldots\in\wp(\mathbb{R})$ (the length of disjoint sets should be the sum of lengths of those sets).

Those all seem very reasonable, but as it turns out, assuming the existence of such a function $\lambda$ satisfying the properties 1.-4. leads to a contradiction (using the axiom of choice). How strange! What can be done? We certainly must insist on the properties 2.-4. - otherwise we could not think of $\lambda$ as any sort of length function. We are then forced to somehow weaken property 1. What we will do is to let go of the idea that any subset of $\mathbb{R}$ should have a "length", and instead try to find the biggest class $\mathcal{E}$ of subsets of $\mathbb{R}$ for which $\lambda:\mathcal{E}\to[0,\infty]$ may be constructed with the properties 2.-4. This is where the Borel-$\sigma$-algebra $\mathcal{B}(\mathbb{R})$ comes into play. There turns out to be one and only one function $\lambda:\mathcal{B}(\mathbb{R})\to[0,\infty]$ with the properties 2.-4. We call this function "the Lebesgue measure" (on $\mathbb{R}$), and this construction may be extended to $\mathbb{R}^d$ for any $d\in\mathbb{N}$. It is possible to extend $\lambda$ to a bigger class of subsets of $\mathbb{R}$ than $\mathcal{B}(\mathbb{R})$, but for most purposes it is sufficient with the Borel-$\sigma$-algebra (it contains most likely any subset you could think of).

The above mentioned contradiction is based on the construction of the so-called "Vitali's set", which can be seen here:

I know that this answer is purely measure theory and not exactly probablility theory, but it is very important if you want to study probability. You will quickly meet the term "density", and this is exactly what is meant by the probability density function of a continuous random variable $X$, of which you may already be familiar. For this to make sense, the distribution of $X$ must be a measure defined on the same space as $\lambda$, i.e. defined on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$. I will not go into more detail here, unless you are interested, but the point is that the Lebesgue measure is very important in probability theory, and thus the Borel-$\sigma$-algebra is important.

Edit: to convince you further that $\mathcal{B}(\mathbb{R})$ is important, consider the fact that a random variable $X$ defined on some probability space $(\Omega,\mathcal{F},P)$ is exactly a function $X:\Omega\to\mathbb{R}$ which is $\mathcal{F}$-$\mathcal{B}(\mathbb{R})$-measurable.

Related Question