Convergence in Probability VS Convergence in Distribution Weighted Dice Example

convergence-divergencediceprobabilityprobability distributionsprobability theory

Typically, convergence in probability and convergence in distribution are introduced through separate examples. I want to see if I understand their differences using a common example of weighted dice.

First I'll explain my understanding of the random variable and observed value notions. I think of a random variable as analogous to a programming variable as it exists within the code text (which may or may not ever have been compiled). I think of an observed value as analogous to the variable as it exists in memory while the program is running; it is the thing which has a specific numeric value visible in the debugger. As an alternative analogy, an observed value references the result of a random event we see (i.e., "the coin landed heads"), while a random variable references the result of a random event an observer of a perpetually branching multiverse sees (i.e., "the coin landed heads in half of the branches and tails in half of the branches").

Assuming no objections to the above, suppose I roll a pair of weighted dice (recording the sum), then replace them with a new pair and roll the new pair (recording the sum), then replace them with a new pair, and so on. What I want to say is that the successive sums of each rolled dice pair converge in probability to an observed value iff they eventually become and remain arbitrarily close to that value. So if the first die in each pair lands on $2$ with a probability approaching $100\%$ and the second die in each pair lands on $5$ with a probability approaching $100\%$, or alternatively, if the probability distribution of the first die is arbitrary but I am always allowed to observe that $n$ has been rolled and then choose the second die in the pair to have a probability distribution sufficiently similar to a degenerate distribution localized at $7 – n$, in either case the sums of each rolled pair converge in probability to the observed value $7$.

By contrast, I want to say that the successive dice pairs converge in distribution to a random variable iff their probability distributions eventually become and remain arbitrarily close to the probability distribution of that random variable. So if both dice are chosen to be more and more fair in each iteration of the process, or alternatively, if the probability distribution of the first die is arbitrary but I can always reverse engineer a probability distribution for the second die so that the probability distribution of the sum is sufficiently similar to a uniform distribution, in either case the dice pairs converge in distribution to a uniform random variable.

I often see it said that something can converge in probability to a random variable, but I wonder if (at least according to the above distinction) it would be more accurate to say it converges in probability to an observed value (the $x$ rather than the $X$ in $\Pr(X\ =\ x)$). Is what I've said accurate? If not, what is a single, simple example to highlight the difference between convergence in probability and convergence in distribution?

Best Answer

In generality, the two modes of convergence describe the convergence of a sequence of random variables $X_n$ to a random variable $X$.


In the special case where the target $X$ is a constant $x_0$, the two definitions coincide.

  • Suppose $P(|X_n-x_0|>\epsilon) \to 0$ for any $\epsilon>0$. Note that the CDF of the constant $x_0$ is a step function $F(x)$ that is zero for $x < x_0$ and one for $x \ge x_0$. If $x < x_0$ then $$F_{X_n}(x) = P(X_n \le x) \le P(|X_n - x_0| > \frac{x-x_0}{2}) \to 0.$$ You can similarly show that if $x > x_0$ then $F_{X_n}(x) \to 1$. Thus, the CDFs of $X_n$ converge to all continuity points of the step function CDF $F(x)$ of the constant $x_0$, i.e. convergence in distribution.
  • Conversely, if the CDFs $F_{X_n}$ converge to the step function $F$ this implies $P(|X_n - x_0| > \epsilon) = F_{X_n}(x_0 - \epsilon) + (1-F_{X_n}(x_0 + \epsilon)) \to 0 + (1 - 1) = 0$, i.e. convergence in probability.

In particular, your first example is an example of both convergence in probability and in distribution.


When the target $X$ is not a constant, the two notions are different (you can find many examples if you search for them). One particular technical distinction is that you can talk about convergence in distribution without specifying how the $X_1, \ldots, X_n, \ldots$ and $X$ are all related to each other; you can look the CDFs of each of them in isolation without regard to the dependence. On the other hand, convergence in probability requires one to have a common probability space where the $X_n$ and $X$ live; the dependency between these random variables must be specified. See the link above (and other posts on this site and elsewhere on the internet) for more discussion.