Statistics – Central Limit Theorem of Random Vectors When Dimension is Increasing

pr.probabilityst.statistics

This is a question about central limit theorems when the dimension is increasing. Suppose now I have a random vector $X_N = (X_{N1}, \cdots, X_{Np})\in\mathbb{R}^p$. For all $c_p\in\mathbb{R}^p$ with $\|c_p\|_2 = 1$, suppose we have $c_p^\top X_N \xrightarrow{d} N(0,1)$ as $N\to\infty, p\to\infty$. What can we say about the asymptotic distribution of $\sum_{i=1}^p X_{Ni}^2$ (after normalization if needed)? Is it normal or there is a counterexample?

Several thoughts:

  1. If $X_N$ are i.i.d. $N(0,1)$, we have CLT.
  2. If $p$ is fixed, by Cramer-Wood theorem we have asymptotic normality for the $p$-dimensional random vector; then by continuous mapping theorem we have chi-square distribution. But sum of chi-square(1) converge to a normal eventually!

So what I need here is probably a combination of CLT and CMT in increasing dimensions. But I have limited knowledge in this direction.

Background information: why would we care about this type of CLT? Suppose $X_N$ is an aggregation of $p$ summary statistics. Typically we can show any finite linear combination of them are asymptotically normal. If we care about testing a global null to see whether the $p$ true estimands are zero, it is natural to consider a procedure involving a sum-of-square statistics.

More concretely, think about a multiple testing problem where we have $p$ different null hypothesis $H_{0j},j=1,\cdots,p$. For each null hypothesis, we have some testing statistics, say $X_{Nj} = N^{-1}\sum_{i=1}^Nx_{ij}$. Each of these testing statistics are asymptotically normal. If we consider whether they are jointly normal, it's likely to be true since we do linear combination of the $p$ coordinates we have a linear combination of all the samples involved in the $p$ studies, for which we could apply the general CLT. That's why we have a condition such as "for all $c_p$…". If we test the global null(that is, all null hypotheses are true) we might consider aggregating the testing statistics by taking a summation of squares. But now, how do we rigorously justify this squared summation also has certain stable distibution?

Best Answer

In such generality, virtually nothing can be said about the asymptotic distribution of $V_{Np}:=\sum_{i=1}^p X_{Ni}^2$ or even about the existence of such an asymptotic distribution. In particular, $V_{Np}$ may have a non-normal asymptotic distribution or no asymptotic distribution at all.

Indeed, consider the following three simple settings.

Setting 1: All the $X_{Ni}$'s are iid standard normal and $c_p^\top c_p=1$. Then $Y_{Np}:=c_p^\top X_N\sim N(0,1)$ and $V_{Np}\sim\chi^2_p\approx N(p,2p)$ (as $p\to\infty$), so that $V_{Np}$ is asymptotically normal.

Setting 2: $X_{N1}\sim N(0,1)$, $X_{N2}=\cdots=X_{Np}=0$, and $c_p=[1,0,\dots,0]^\top$. Then $Y_{Np}=X_{N1}\sim N(0,1)$ and $V_{Np}=X_{N1}^2\sim\chi^2_1$, so that the asymptotic distribution of $V_{Np}$ is not normal.

Setting 3: This is a combination of Settings 1 and 2: for odd $N$ we use Setting 1, and for even $N$ we use Setting 2. Then there is no asymptotic distribution at all.

Related Question