Central Limit Theorems Involving Nominal-Valued Random Variables

Suppose $X$ is a random variable taking values in a finite set $\{a_1,\ldots, a_k \}$ and for $i=1,\ldots,k,$ $Y_i = \begin{cases} 1 & \text{if } X=a_i, \\ 0 & \text{otherwise.} \end{cases}$

\begin{align}
\text{If } & \operatorname E(\mathbf Y) = \operatorname E\left[ \begin{array}{c} Y_1 \\ \vdots\,\,\, \\ Y_k \end{array} \right] = \left[ \begin{array}{c} p_1 \\ \vdots\,\,\, \\ p_k \end{array} \right] \quad \text{(so that $p_1+\cdots+p_k=1$)} \\[12pt] \text{then } & \operatorname{var}(\mathbf Y) = \operatorname{var}\left[ \begin{array}{c} Y_1 \\ \vdots\,\,\, \\ Y_k \end{array} \right] = \left[ \begin{array}{cccccc} \ddots & \vdots & & \vdots \\ \cdots & p_i(1-p_i) & \cdots & -p_ip_j & \cdots \\ & \vdots & \ddots & \vdots \\ \cdots & -p_i p_j & \cdots & p_j(1-p_j) & \cdots \\ & \vdots & & \vdots & \ddots \end{array} \right].
\end{align}
This variance is a $k\times k$ matrix of rank $k-1.$

With an infinite sequence of i.i.d. copies of $\mathbf Y,$ we have a central limit theorem saying that the probability distribution of a suitably centered and rescaled sum of the first $n$ copies of $\mathbf Y$ approaches a normal (or "Gaussian") distribution with the $k\times k$ variance above.

Everybody knows that much, but now suppose we have a countably infinite set $\{a_1,a_2,a_3,\ldots\}$ of values of $X$ with corresponding probabilities $p_1,p_2,p_3,\ldots$ and we define $\mathbf Y$ similarly.

How should we modify our central limit theorem for that case? Might we have pointwise but not uniform convergence to a limiting distribution? If for each $i,$ the distribution of $W_i$ is limiting distribution of the $i\text{th}$ component, would it still be the case that every linear combination $\sum_i c_iW_i$ with constant (i.e. non-random) coefficients $c_i$ would be normally distributed in the limit? Might some conclusions depend on how fast $\sum_ip_i$ converges to $1\text{?}$ Are there published results?

Best Answer

$\newcommand\lp{\mathrm{LP}}\newcommand\ep{\varepsilon}$Let $Y:=\mathbf Y$. Let $Z:=Y-EY.$ Then $$E\|Z\|^2=\sum_{j=1}^\infty p_j(1-p_j)\le\sum_{j=1}^\infty p_j=1<\infty,$$ where $\|\cdot\|$ is the $\ell^2$ norm. So, $Z$ is a zero-mean random vector in the Hilbert space $H:=\ell^2$, with a finite second strong moment and thus with a well-defined covariance operator $T$. Let $G$ be the zero-mean Gaussian random vector over $H$ with covariance operator $T$. Let $$S_n:=\frac1{\sqrt n}\sum_{i=1}^n Z^{(i)},$$ where the $Z^{(i)}$'s are independent copies of $Z$. Then it is a well-known fact that $S_n\to G$ in distribution (as $n\to\infty$) -- see e.g. Theorem 3.6 of Hoffmann-Jørgensen and Pisier.

More refined results are given e.g. in the paper by Bentkus and Götze; see other papers by Götze for other similar results and further references.

In this particular case, the convergence $S_n\to G$ in distribution can be shown more elementarily, as follows. For each natural $m$ and all $x=(x_1,x_2,\dots)\in\ell^2$, let $$\pi_m x:=(x_1,\dots,x_m,0,0,\dots),$$ so that $\pi_m$ is the projector onto "the first $m$ coordinates". Let $\rho_m:=I-\pi_m$, where $I$ is the identity operator, so that $\rho_m$ is the projector onto "the coordinates beyond the first $m$ coordinates". Then, for each natural $m$, by the finite-dimensional central limit theorem (CLT), $\pi_m S_n=\frac1{\sqrt n}\sum_{i=1}^n \pi_m Z^{(i)}\to\pi_m G$ in distribution as $n\to\infty$, that is, $$\lp(\pi_m S_n,\pi_m G)\underset{n\to\infty}\longrightarrow0, \tag{1}\label{1}$$ where $\lp(U,V)$ is the Lévy–Prokhorov distance between the (distributions of) random vectors $U$ and $V$.

Next, by the "Pythagoras theorem" for $\ell^2$ and because the $Z^{(i)}$'s are independent copies of $Z$, $$E\|\rho_m S_n\|^2=E\|\rho_m Z\|^2=\sum_{j=m+1}^\infty p_j(1-p_j)\le\sum_{j=m+1}^\infty p_j=:q_m\to0 \tag{2}\label{2} $$ as $m\to\infty$. Because $S_n$ and $G$ have the same covariance operator, we have $$E\|\rho_m G\|^2=E\|\rho_m S_n\|^2\le q_m. \tag{3}\label{3}$$
So, for any Borel subset $A$ of $\ell_2$ and any real $\ep$, denoting by $A_\ep$ the $\ep$-neighborhood of $A$ and using Markov's inequality and \eqref{1}, \eqref{2}, and \eqref{3}, we have \begin{align} &P(S_n\in A) \\ &\le P(\pi_m S_n\in A_\ep)+P(\|\rho_m S_n\|\ge\ep) \\ &\le P(\pi_m G\in A_{2\ep})+\lp(\pi_m S_n,\pi_m G)+\frac{q_m}{\ep^2} \\ &\le P(G\in A_{3\ep})+P(\|\rho_m G\|\ge\ep)+\lp(\pi_m S_n,\pi_m G)+\frac{q_m}{\ep^2} \\ &\le P(G\in A_{3\ep})+\frac{q_m}{\ep^2}+\lp(\pi_m S_n,\pi_m G)+\frac{q_m}{\ep^2}. \end{align} Choosing now any $m$ such that $\frac{q_m}{\ep^2}<\ep$, for all large enough $n$ we have $\lp(\pi_m S_n,\pi_m G)<\ep$ and hence $$P(S_n\in A) \le P(G\in A_{3\ep})+3\ep.$$ Similarly, for all large enough $n$ we have $$P(G\in A) \le P(S_n\in A_{3\ep})+3\ep.$$ So, $\lp(S_n,G)\le3\ep$ for all large enough $n$. Thus, $S_n\to G$ in distribution. $\quad\Box$

Best Answer

Related Solutions

Related Question