Distributions – Constructing a Discrete Random Variable with Rational Support in $[0,1]$

distributionsmathematical-statistics

This is the constructivist sequel of this question.

If we can't have a discrete uniform random variable having as support all the rationals in the interval $[0,1]$, then the next best thing is:

Construct a random variable $Q$ that has this support, $Q\in \mathbb{Q}\cap[0,1]$, and that it follows some distribution. And the craftsman in me requires that this random variable is constructed from existing distributions, rather than created by abstractly defining what we desire to obtain.

So I came up with the following:

Let $X$ be a discrete random variable following the Geometric Distribution-Variant II with parameter $0<p<1$, namely

$$ X \in \{0,1,2,…\},\;\;\;\; P(X=k) = (1-p)^kp,\;\;\; F_X(X) = 1-(1-p)^{k+1}$$

Let also $Y$ be a discrete random variable following the Geometric Distribution-Variant I with identical parameter $p$, namely

$$ Y \in \{1,2,…\},\;\;\;\; P(Y=k) = (1-p)^{k-1}p,\;\;\; F_Y(Y) = 1-(1-p)^k$$

$X$ and $Y$ are independent. Define now the random variable

$$Q = \frac {X}{Y}$$

and consider the conditional distribution

$$P(Q\leq q \mid \{X\leq Y\})$$

In loose words "conditional $Q$ is the ratio of $X$ over $Y$ conditional on $X$ being smaller or equal than $Y$."
The support of this conditional distribution is $\{0,1,1/2,1/3,…,1/k,1/(k+1),…,2/3,2/4,…\} = \mathbb{Q}\cap[0,1]$.

The "question" is:
Can somebody please provide the associated conditional probability mass function?

A comment asked "should it be closed-form"? Since what constitutes a closed-form nowadays is not so clear cut, let me put it this way: we are searching for a functional form into which we can input a rational number from $[0,1]$, and obtain the probability (for some specified value of the parameter $p$ of course), leading to an indicative graph of the pmf. And then vary $p$ to see how the graph changes.

If it helps, then we can make the one or both bounds of the support open, although these variants will deprive us of the ability to definitely graph the upper and/or lower values of the pmf. Also, if we make open the upper bound, then we should consider the conditioning event $\{X<Y\}$.

Alternatively, I welcome also other r.v.'s that have this support(s), as long as they come together with their pmf.

I used the Geometric distribution because it has readily available two variants with the one not including zero in the support (so that division by zero is avoided). Obviously, one can use other discrete r.v.'s, using some truncation.

I most certainly will put a bounty on this question, but the system does not immediately permit this.

Best Answer

Consider the discrete distribution $F$ with support on the set $\{(p,q)\,|\, q \ge p \ge 1\}\subset \mathbb{N}^2$ with probability masses

$$F(p,q) = \frac{3}{2^{1+p+q}}.$$

This is easily summed (all series involved are geometric) to demonstrate it really is a distribution (the total probability is unity).

For any nonzero rational number $x$ let $a/b=x$ be its representation in lowest terms: that is, $b\gt 0$ and $\gcd(a,b)=1$.

$F$ induces a discrete distribution $G$ on $[0,1]\cap \mathbb{Q}$ via the rules

$$G(x) = G\left(\frac{a}{b}\right) = \sum_{n=1}^\infty F\left(an, bn\right)=\frac{3}{2^{1+a+b}-2}.$$

(and $G(0)=0$). Every rational number in $(0,1]$ has nonzero probability. (If you must include $0$ among the values with positive probability, just take some of the probability away from another number--like $1$--and assign it to $0$.)

To understand this construction, look at this depiction of $F$:

[Figure of F]

$F$ gives probability masses at all points $p,q$ with positive integral coordinates. Values of $F$ are represented by the colored areas of circular symbols. The lines have slopes $p/q$ for all possible combinations of coordinates $p$ and $q$ appearing in the plot. They are colored in the same way the circular symbols are: according to their slopes. Thus, slope (which clearly ranges from $0$ through $1$) and color correspond to the argument of $G$ and the values of $G$ are obtained by summing the areas of all circles lying on each line. For instance, $G(1)$ is obtained by summing the areas of all the (red) circles along the main diagonal of slope $1$, given by $F(1,1)+F(2,2)+F(3,3)+\cdots$ = $3/8 + 3/32 + 3/128 + \cdots = 1/2$.

Figure

This figure shows an approximation to $G$ achieved by limiting $q\le 100$: it plots its values at $3044$ rational numbers ranging from $1/100$ through $1$. The largest probability masses are $\frac{1}{2},\frac{3}{14},\frac{1}{10},\frac{3}{62},\frac{3}{62},\frac{1}{42},\ldots$.

Here is the full CDF of $G$ (accurate to the resolution of the image). The six numbers just listed give the sizes of the visible jumps, but every part of the CDF consists of jumps, without exception:

Figure 2

Related Question