Probability – How Fast Does a Gaussian Random Walk Move Away from the Origin?

pr.probabilityrandom walksst.statisticsstochastic-processes

Suppose $z_i$ are IID zero-centered $d$-dimensional Gaussian random variables with unit-trace covariance $\Sigma$ and $g(z_i)$ is the sum of its components.

Consider the following random walk:

$$x_s=\frac{1}{s}\left(g(z_1)z_1+\ldots+g(z_s)z_s\right)$$

Let $f_s=\|x_s\|^2$. For large $d$, expected value of $E[1/f_s]$ is bounded above for $s\to\infty$ and smoothly interpolates between two extremes for intermediate $s$. Where is the halfway point located? Simple relation seems to exist between this point and measure of uniformity of $\Sigma$ eigenvalues.

I'm observing the following:
$$s>\frac{1}{\operatorname{Tr}\Sigma^2} \text{ implies } 2E\left[\frac{1}{f_s}\right]>\frac{1}{Ef_1}+\frac{1}{Ef_\infty}$$

Why does $\operatorname{Tr}(\Sigma^2)$ matter here?
How do I prove the inequality above?

(PS, the following relationship is also observed when $d>100$:

$$\begin{equation}
E\left[\frac{1}{f_s}\right]> \left( \frac{1}{s} Ef_1 + \frac{s-1}{s}Ef_\infty\right)^{-1}\label{y3}\tag{y3}
\end{equation}
$$

notebook

Motivation

this equation, if it holds, sets an easy to compute limit for the number of SGD updates that can be performed in parallel when observations are Gaussian (this random walk is in denominator of Eq 1.5 here), saturation corresponds to step size scaling unfavorably. Intriguingly, similar harmonic mean relationship holds for other choices of $f$ which correspond to different ways of setting step size. For instance it holds when $f_s$ is Frobenius norm of Wishart matrix with $s$ degrees of freedom, or spectral norm of Wishart matrix with $s$ degrees of freedom (details).
The quantity which upper bounds the value in the graph above, $1/Ef_\infty$ seems to be equal to $R$, the "effective rank" of the Gaussian, described in Lemma 5 of Bartlett paper. That paper shows that high value of $R$ makes the problem easy for statistical estimation. Proving the formula above would help prove that high $R$ also makes the estimation procedure easy to parallelize.
link to Eq $\eqref{y3}$

Best Answer

$\newcommand\1{\mathbf1}\newcommand\R{\mathbb R}\newcommand\Si{\Sigma}\newcommand{\si}{\sigma}\newcommand{\Ga}{\Gamma}$Letting
$$y_i:=g(z_i)z_i=z_iz_i^\top\1,$$ where $\1:=[1,\dots,1]^\top\in\R^d$, we get $$Ey_i=\Si\1.$$ So, by the strong law of large numbers $$x_s=\frac1s\,\sum_1^s y_i\to\Si\1$$ almost surely as $s\to\infty$, so that \begin{equation*} \|x_s\|^2\to \si_d^2:=\1^\top\Si^2\1.\tag{1}\label{1} \end{equation*} almost surely as $s\to\infty$. Also, by the Jensen inequality for the function $(0,\infty)\ni x\mapsto1/x$, \begin{equation*} E\frac1{\|x_s\|^2}\ge\frac1{E\|x_s\|^2}. %=\frac1{\si_d^2}. \tag{2}\label{2} \end{equation*}

Moreover, we can show that $\dfrac1{\|x_s\|^2}$ is uniformly integrable (see Lemma 1 below). So, one immediately gets from \eqref{1} that \begin{equation*} E\frac1{\|x_s\|^2}\to\frac1{\si_d^2}. \tag{3}\label{3} \end{equation*}

So, the relevant quantity here is, not $\text{Tr}(\Si^2)$, but $\si_d^2=\1^\top\Si^2\1$ -- the sum of all entries of the matrix $\Si^2$.

Lemma 1: $\dfrac1{\|x_s\|^2}$ is uniformly integrable.

Proof: By the Cauchy--Schwarz inequality,
\begin{equation*} \|x_s\|\ge\frac1{\sqrt d}\,x_s^\top\1 =\frac1{\sqrt d}\frac1s\sum_{i=1}^s y_i^\top\1 =\frac1{s\sqrt d}\sum_{i=1}^s (z_i^\top\1)^2=:R. \end{equation*} Next, $z_i^\top\1\sim N(0,\si_d^2)$ and hence $(z_i^\top\1)^2\sim Gamma(1/2,2\si_d^2)$ and $R\sim Gamma(s/2,2\si_d^2/(s\sqrt d))$. So, for $s>6$, \begin{equation*} \begin{aligned} E\frac1{\|x_s\|^3}\le E\frac1{R^3} &=\Big(\frac{s\sqrt d}{2\si_d^2}\Big)^3 \frac1{\Ga(s/2)}\int_0^\infty u^{s/2-1-3} e^{-u}\,du \\ &=\Big(\frac{s\sqrt d}{2\si_d^2}\Big)^3 \frac{\Ga(s/2-3)}{\Ga(s/2)} \underset{s\to\infty}\sim \Big(\frac{s\sqrt d}{2\si_d^2}\Big)^3 \frac1{(s/2)^3} =\Big(\frac{\sqrt d}{\si_d^2}\Big)^3<\infty. \end{aligned} \end{equation*} So, by the de la Vallée-Poussin theorem, $\dfrac1{\|x_s\|^2}$ is uniformly integrable. $\quad\Box$

Following the lines of the proofs of the de la Vallée-Poussin theorem and Lemma 1, one can obtain an upper bound on the difference $%E\frac1{\|x_s\|^2}-\frac1{E\|x_s\|^2}= E\frac1{\|x_s\|^2}-\frac1{\si_d^2}$.

Note also that \begin{equation*} E\|x_s\|^2=\frac1{s^2}\sum_{i,j=1}^d Ey_i^\top y_j =\frac1{s^2}\sum_{i,j=1}^d Ez_i^\top\1 z_j^\top\1 z_i^\top z_j =\frac1s\,Ez_1^\top\1 z_1^\top\1 z_1^\top z_1 +\frac{s(s-1)}{s^2}Ez_1^\top\1 z_2^\top\1 z_1^\top z_2 \end{equation*} and \begin{equation*} Ez_1^\top\1 z_2^\top\1 z_1^\top z_2 =\sum_{i,j,k=1}^d Ez_{1i}z_{2j}z_{1k}z_{2k} =\sum_{i,j,k=1}^d \Si_{ik}\Si_{kj} =\sum_{i,j=1}^d (\Si^2)_{ij}=\1^\top \Si^2\1=\si_d^2, \end{equation*} where $z_{ri}$ is the $i$th coordinate of the random vector $z_r$. So, \begin{equation*} E\|x_s\|^2\sim \si_d^2 \end{equation*} as $s\to\infty$.

Best Answer

Related Solutions

[Math] Generating function for Random Walk Hitting Time, taking the wrong root

[Math] Mixing time of lazy random walk on the directed cycle $C_n$

Related Question