[Math] Density w.r.t. counting measure and probability mass function (discrete rv)

measure-theoryprobability theory

By way of disclosure, let me start by saying that I am not a mathematician. However, I came to peruse Kyle Siegrist's web-based text in probability theory (http://www.randomservices.org); and I a quite intrigued.

Unfortunately, I am currently struggling to reconcile a few of the statements that I came across. Since my struggles might be due to a somewhat more fundamental lack of understanding, I trust you can forgive, if I start my question by summarizing what I believe to have understood so far. Pleasure feel free to skip to the demarcated passage and refer to the introduction only if you feel that I misunderstood some of the more fundamental aspects of probability theory.

I start with a probability space $( \Omega,\mathcal{F},\mathbb{P})$ where $\Omega$ is a finite, countably infinite , or uncountably infinite sample space and $\mathcal{F}$ is a $\sigma$-algebra furnishing a domain for probability measure $$\mathbb{P}\,:\,\mathcal{F}\,\rightarrow\,[0,1].$$
Now, I consider a measurable space $(\mathbb{R},\mathcal{B})$ where $\mathbb{R}$ is the real line and $\mathcal{B}$ is the $\sigma$-algebra generated by the standard Euclidean topology, i.e. the extension of the collection of all open sets in $\mathbb{R}$ to a $\sigma$-algebra. This measurable space allows for the definition of a measurable function $$X\,:\,\Omega\,\rightarrow\,\mathbb{R}$$
such that the preimage $X^{-1}(A)\in\mathcal{F}$ for all $A\in\mathcal{B}$. I suppose that could also be written as $\sigma(X)\subseteq\mathcal{F}$. $X$ is then called a random variable. Utilizing a change of variables theorem, I proceed to define a measure $\mu$ on $(\mathbb{R},\mathcal{B})$ that corresponds to $\mathbb{P}$ such that $$\mu(A)=\mathbb{P}[X^{-1}(A)]\quad\forall\quad A\in\mathcal{B}.$$
Since $X^{-1}(A)\in\mathcal{F}$, $\mathbb{P}(X^{-1}(A))$ exists in $[0,1]$. Hence,
$$\mu\,:\,\mathcal{B}\,\rightarrow\,[0,1]$$ and is called the distribution of $X$. From there it is easy enough to show that $\mu$ is, in fact, a probability measure making $(\mathbb{R},\mathcal{B},\mu)$ a probability space. Now, I can make a statement about a random variable that corresponds to an event:
$$X(\omega)\in A\quad(A\in\mathcal{B})\quad\text{corresponds to}\quad X^{-1}(A)\in\mathcal{F}.$$Moreover, I can assign corresponding probabilities since $\mu(A)=\mathbb{P}[X^{-1}(A)].$
Finally, I recognize that intervals (on the real line) of the type $$(-\infty,a]$$ are Borel sets such that $(-\infty,a]\in\mathcal{B}$ for all $a\in\mathbb{R}$. Therefore, $\mu(-\infty,a]$ is defined. Equipped with $\mu$, I can then go on to define a distribution function
$$F\,:\,\mathbb{R}\,\rightarrow\,[0,1]$$
such that
$$F(x)=\mu(-\infty,x].$$
It is reasonably straightforward to show that $F$ is monotonically increasing, right-continuous, and normalized to take values between $0$ and $1$. Moreover,
$$\mu(a,b]=F(b)-F(a).$$
and
$$\mu\{a\}=F(a)-F(a^{-})\quad\text{where $F(a^{-})$ is the limit from the left}.$$
Hence $F$ is continuous (from the left and from the right) iff $\mu\{a\}=0$ or all $a\in\mathbb{R}$. On the other hand, if $\mu$ has an atom at point $x\in\mathbb{R}$ such that $\mu\{x\}>0$, then $F$ is a step function.

This insight naturally leads to my question.
All of the above did not deal with the question of whether $X$ is a continuous or a discrete random variable. So I like to believe that everything I wrote is valid in both cases.



But: I am not sure how to reconcile that fact that $\mu$ has a domain comprised of uncountable Borel sets with the derivation of the probability mass function (as the Radon-Nikodym derivative of $\mu$ w.r.t the counting measure) for a discrete random variable.

In the continuous case I understood that I can argue as follows: $\Omega$ is necessarily uncountable and moreover the image of $\Omega$ under $X$ is also uncountable. However, it might be an uncountable subset of the reals such as $\mathbb{R}_+$. When I defined the distribution $\mu$ of $X$, though, I did so using as domain all Borel sets in $\mathbb{R}$ – even though X couldn't possibly take values in the negative reals. So what I did above can only be valid, if the Borel sets that are disjoint from the support of $X$ are simply assigned a measure of $0$. (Correct?)

I can also characterize the feature of being a continuous random variable in terms of absolute continuity w.r.t to Lebesgue measure $\lambda$. That is:
$$\lambda(\emptyset)=0\qquad\text{and}\qquad \mu(\emptyset)=0$$
$$\lambda\{x\}=0\qquad\text{and}\qquad \mu\{x\}=0$$
where $\mu\{x\}=0$ is only true for continuous random variables.

Question 1: I see how that relates to the distribution function being continuous from the right as well as from the left, but is there a straightforward way to link that to the image of $\Omega$ und $X$ being uncountable?

So, if $\mu\{x\}=0$, the distribution $\mu$ is absolutely continuous w.r.t $\lambda$.
By the Radon-Nikodym Theorem there exists a density (RN-Derivative) such that $$\mu(A)=\int_A\,f\,\text{d}\lambda\quad\forall\quad A\in\mathcal{B}$$

Since, in my example above, every non-singleton subset of the negative reals has zero measure w.r.t. $\mu$ but positive Lebesgue measure, it does follow that the RN-derivative $f(x)$ is simply set to zero, whenever $x\not\in X(\Omega).$ (Correct?)

My struggle is mostly with the discrete case. I would argue that $\Omega$ can be countable or uncountable, but for a random variable $X$ to be discrete $X(\Omega)$ must be countable – so the random variable has countable support.

Moreover, I understood that $X$ is discrete if the distribution $\mu$ is a discrete measure such that $\mu(\mathbb{R}\setminus X(\Omega))=0$ where $X(\Omega)$ is clearly countable. (Correct?)

It is also true that
$\#(\emptyset)=0\qquad\text{and}\quad\mu(\emptyset)=0.$
Therefore, the distribution ($\mu$) of $X$ is absolutely continuous w.r.t counting measure ($\#$). And, thus, by the Radon-Nikodym theorem, there exists a density such that.
$$\mu(A)=\int_A\,f\,\text{d}\#\quad\forall\quad A\in\mathcal{B}.$$

If $A$ was a subset of the support of $X$ ($A\subseteq X(\Omega)$), i.e. $A$ would only consist of values that $X$ actually might take, then $A$ would be countable, $f$ would be a simple function and
$$\mu(A)=\int_A\,f\,\text{d}\#$$
$$\mu(A)=\sum_{x\in A}f(x)\#(\{x\}).$$
Since the count of a singleton is $\#(\{x\})=1$, it would follow that
$$\mu(A)=\sum_{x\in A}f(x)$$
and as $\mu(A)=\mathbb{P}(X\in A)$
$$\mathbb{P}(X\in A)=\sum_{x\in A}f(x)$$
such that the RN-derivative of $\mu$ w.r.t $\#$ has the interpretation of the usual probability mass function for discrete random variables.

Question 2: But $\mu$ is defined over the Borel sets. So $\mathbb{P}(X\in A)\,[\,=\mu(A)\,]$ might take the form $\mathbb{P}(a< X\leq b)\,[\,=\mu(a,b]\,]$, which is a valid statement even for a discrete random variable.
So, even if I presume that $f(x)=0$ for all $x\not\in X(\Omega)$, how do I come from
$$\mu(A)=\int_A\,f\,\text{d}\#\quad\forall\quad A\in\mathcal{B}.$$
to
$$\mu(A)=\sum_{x\in A}f(x)$$
since $A$ is not countable and I cannot use countable additivity?

Thank you so very much.

Best wishes,
Jon

Best Answer

So, if $\mu\{x\}=0$, the distribution $\mu$ is absolutely continuous w.r.t $\lambda$.

That is not true. The condition is necessary and sufficient of $X$ to be continuous in the sense that CDF $F$ is a continuous function, but for the existence of a PDF more is needed: $\mu<<\lambda$ which means that $\mu(B)=0$ must be true for every $B\in\mathcal B$ that satisfies $\lambda(B)=0$. This condition is necessary and sufficient for the existence of a Radon-Nikodym derivative. If $f$ is such a derivative (there is more than one) then in your example indeed also the function prescribed by $x\mapsto f(x)$ for $x\in\mathbb R_+$ and $x\mapsto 0$ otherwise is such a derivative, and in that situation functions with that property are commonly chosen as PDF. Next to that we strive to use a PDF having other nice properties like continuity.

I would argue that $\Omega$ can be countable or uncountable, but for a random variable $X$ to be discrete $X(\Omega)$ must be countable - so the random variable has countable support.

For a discrete random variable it is not necessary that $X(\Omega)$ is countable. For example on $(\Omega=\mathbb R,\mathcal B)$ we can define $\mathbb P$ by setting that $\mathbb P(B)=1\iff0\in B$ and then we can define random variable $X$ by $\omega\mapsto\omega$. That leads to discrete random variable with $\mu(\{0\})=1$, but $X(\Omega)=\mathbb R$ is uncountable. In fact if we are looking for a support then $X(\Omega)$ can be a candidate, but is often not okay. It might also be that $X(\Omega)\notin\mathcal B$ and a support (what exactly is it?...) must be something as a "smallest" set $S\in\mathcal B$ with $\mu(S)=1$. By discrete random variables such a set exists and above it is the set $\{0\}$. Actually $X$ is by definition discrete if a countable set $S\subseteq\mathbb R$ exists with $\mathbb P(X\in S)=1$ or equivalently $\mu(S)=1$ and as shown we do not need $X(\Omega)\subseteq S$ for that. If there are elements $s\in S$ with $\mathbb P(X=s)=0$ then we can just "throw them out" and what is left is again a countable subset with $\mu(S)=1$ and now also $\mu(\{s\})>0$ for every $s\in S$, so that a smaller set with $\mu(S)=1$ cannot exist. Since $S$ is countable we have $S\in\mathcal B$ so that $\mu(S)$ is automatically defined.

In the continuous case we must not search for a smallest set $S$ with $\mu(S)$ simply because it does not exist: if $\mu(S)=1$ then also $\mu(S-T)=1$ whenever $T$ is a countable set. We can look for a nice $S\subseteq\mathbb R$ with $S\in\mathcal B$ and $\mu(S)=1$. In the example you mentioned we can take $\mathbb R_+$, but again it is not necessary that $X(\Omega)\subseteq\mathbb R_+$.


Concerning question 2

Let it be that $S\subseteq\mathbb R$ is countable and serves as support of discrete $X$. Then define $\#A$ to be the cardinality of $A\cap S$. Then $\#$ is a measure on $\mathcal B$ (or even on $\wp(\mathbb R)$). By definition:$$\int_A fd\#=\sum_{a\in S\cap A}f(a)$$