Creating A Mathematical Example for Convergence in Distribution

probabilityprobability distributions

I am trying to better understand the differences between Convergence in Probability vs Convergence in Distribution. As I understand, Convergence in Probability is stronger than Convergence in Distribution – but I am trying to better understand these principles through examples.

Part 1: Convergence In Probability seems a bit easier to understand.

A sequence of random variables $\{X_n\}$ converges in probability to a random variable $X$ as $n$ approaches infinity if for any $\epsilon > 0$ we have

\begin{equation}
\lim_{n \to \infty} P(|X_n – X| < \epsilon) = 1
\end{equation}

This means that the probability that the difference between $X_n$ and $X$ is less than some small positive number $\epsilon$ approaches 1 as $n$ increases. Thus, as $n$ gets larger, the values of $X_n$ get closer and closer to the value of $X$ with high probability.

Example: Let $X_1, X_2, \ldots, X_n$ be a sequence of independent and identically distributed random variables with an Exponential Distribution of parameter $n$. Then, for any $\epsilon > 0$, we can say that $X_1, X_2, \ldots, X_n$ converges to 0 in probability:

$$X_n \xrightarrow{p} 0$$

Apparently, this is like saying: As as $n$ becomes bigger, $X_n$ is more likely to be 0. We can see this with the following simulation (in R):

N <- 1000


X_n <- numeric(N)


for (n in 1:N) {
  X_n[n] <- rexp(1, rate = n)
}

plot(X_n, type = "l", col = "blue", lwd = 2, main = "Convergence In Probability: Simulated values of X_n", xlab = "n", ylab = "X_n")

In the above graph, we can see that as $X_n$ becomes larger, the values fluctuate more and more around 0.

Part 2: Convergence in Distribution is a bit more difficult.

Convergence in Distribution can be defined as:

$$X_n \xrightarrow{d} X \iff \lim_{n \to \infty} F_n(x) = F(x) \text{ for all } x \text{ at which } F \text{ is continuous}$$

My Question: I was wondering – can a similar simulation example be created which illustrates Convergence in Distribution?

Thanks!

Best Answer

Convergence in Probability

To properly demonstrate convergence in probability, we need to generate many "sample paths" and show that as $n \to \infty$ it becomes increasingly unlikely that any of these sample paths is far from zero.

M <- 200
N <- 1000
S <- matrix(nrow=N,ncol=M)

sample_path <- function(v){
  t <- length(v)
  path <- numeric(t)
  for(i in 1:t){
    path[i] <- mean(v[1:i])
  }
  return(path)
}

for (m in 1:M) {
  S[,m] <- sample_path(rexp(N,rate=1))
}

matplot(S, type = "l",pch=1,col = 1:M)
legend("topright", legend = 1:M, col=1:M, pch=1)

Which gives you a plot like this:

You can see that the ensemble of sample paths increasingly cluster around the mean $1$.

Convergence in Distribution

To show this one, you don't want to look at the sample paths like we did above, you want to look at the ECDF of the samples at each time slide and compare them to a reference distribution. We'll show this using the usual Central Limit Theorem, which is a very widely used convergence (in distribution) theorem:

To restate the classic CLT:

Let $X_i$ be a sequence of iid random variables where $E[X_i]=\mu$ and $V[X_i]=\sigma^2 <\infty$ then:

$$\sqrt{n}\left(\frac{\bar X - \mu}{\sigma}\right) \xrightarrow{d} N(0,1)$$

To numerically show this we are going to plot a series of ECDFs and the standard normal CDF on a plot to show convergence.

We can piggyback off the previous code and continuing on:

# now do conv. in dist
St <- t(S)
q <- seq(-3,3,.1)
stdNorm <- pnorm(q)
plot(x=q,y=stdNorm, type="l", col="red")

# thin out dataset by only plotting after we average another 100 samples
for(t in seq(1,N,100)){
  lines(ecdf(sqrt(t)*(St[,t]-1)),type="l",col="black")
}

# re-plot so standard normal shows up on top
lines(x=q,y=stdNorm, type="l", col="red",lwd=10)

You get a plot like this:

We can see that the ECDF of the sample average (under the Z transform) clusters around the standard normal.

Still, it's kind of hard to see with all the lines so let's plot a statistic that explicitly measure the discrepancy between two CDFs. It's called the Kolmogorov-Smirnov statistic -- it is the max difference between two CDFs.

Let's see how the KS distances behave for our sample paths:

# calculate KS statistics
ks <- numeric(N)
for(t in seq(1,N)){
  ks[t] <- max(100*abs(stdNorm - ecdf(sqrt(t)*(St[,t]-1))(q)))
} 

plot(ks)

So the KS distance also converges to $0$ as the ECDF converges to the the normal CDF.

Almost Sure Convergence

There is a subtle difference between convergence in probability and almost sure convergence. The former says that it is highly unlikely that your particular sample average will be far from the true mean when you use ever-larger samples.

The latter says that it is practically guaranteed that the sample mean for any infinitely long experiment will converge to the true value. This is a statement about individual sample paths vs the ensemble of all sample paths.

Best Answer

Convergence in Probability

Convergence in Distribution

Almost Sure Convergence

Related Solutions

[Math] Convergence in probability versus convergence in distribution

Convergence in Probability VS Convergence in Distribution Weighted Dice Example

Related Question