This is about upper bounding Rademacher complexity by Gaussian complexity but I am only asking about a step in the proof and the actual question is not so important. A similar question was asked before about Exercise 5.5b here.
I can mostly do the proof by following the outline the instructor provided but there is one key component I don't understand.
We know $G(T) = \mathbb{E}[\text{sup}_{\theta \in T} \sum_{j=1}^{d} \theta_j w_j]$ where the $w$s are iid Gaussians. The first step involves writing $w_j = \text{sign}(w_j) \cdot \lvert w_j \rvert = \varepsilon_j \lvert w_j \rvert$ where $\varepsilon_j$ is a Rademacher variable and is independent of $\lvert w_j\rvert$. This last part is the part I am unable to convince myself of. How do we show that the absolute value and the sign of a gaussian should be independent?
The linked question has an answer that also uses this fact.
Best Answer
This is not a special property of Gaussians: it holds for any distribution that is symmetric about zero and has zero probability of actually equaling zero.
Independence can be shown (and intuited) by considering the distribution of the absolute value conditional on the sign: "symmetric about zero" means exactly the same thing as "the absolute value distribution is the same regardless of the sign." But that's precisely what independence is.
Here are plots of the density of an asymmetric distribution (at left) and the two conditional distributions of its absolute value:
The red region at right shows the distribution of positive values: that is, it's the distribution of the absolute values conditional on the sign being positive. It's the same as the graph of the positive values at left, but scaled to have a total area of $1.$
The blue region at right shows the distribution of absolute values of the negative numbers: that is, it's the distribution of the absolute values conditional on the sign being negative. It's the same as the graph of the negative values at left, but scaled to have a total area of $1$ and reversed (because the absolute value reverses the order of negative numbers).
The asymmetry is clear: because the red and blue regions do not coincide, the original distribution could not have been symmetric.
When the original distribution is symmetric, the red and blue regions overlap perfectly, merging into a common purple region, as in this figure:
If you want mathematical formalism, suppose $X$ is a random variable with the given distribution. Symmetry implies that for any positive number $x,$ the chance that $X$ is in the interval $(0, x]$ -- that is, the chance that $0\lt X \le x$ -- is the same as the chance $X$ is in the reflected interval $[-x,0)$ -- that is, the chance that $x \le X \lt 0.$
Let's work out the distribution of the bivariate random variable $(\operatorname{sgn}(X), |X|).$ By definition, this requires us to consider chances of events of the form $\operatorname{sgn}(X)\le s\text{ and } |X| \le x.$ Because the sign can only have the values $\pm 1,$ it suffices to examine those values of $s;$ and because the absolute value is nonnegative, we only have to consider nonnegative $x.$ Under these conditions,
$$\Pr(\operatorname{sgn}(X)=-1\text{ and } |X|\le x) = \Pr(X \lt 0\text{ and }-X \le x) = \Pr(-x \le X \lt 0) \tag{*}$$
because $\operatorname{sgn}(X)=-1$ means $X$ is negative, equivalent to $|X|=-X.$
Similarly,
$$\Pr(\operatorname{sgn}(X)=1\text{ and } |X| \le x) = \Pr(X \gt 0\text{ and } X \le x) = \Pr(0 \lt X \le x) \tag{**}$$
because $\operatorname{sgn}(X)=1$ means $X\gt 0$ and $|X|=X.$
Since $X$ has no chance of equaling $0$ (by assumption),
$$\Pr(\operatorname{sgn}(X)=1) + \Pr(\operatorname{sgn}(X)=-1) = 1$$
and since (by symmetry) those two terms are equal, each has probability $1/2.$ By a standard formula for conditional probability, $(*)$ tells us
$$\begin{aligned}\Pr(|X| \le x\mid \operatorname{sgn}(X) = -1) &= \frac{\Pr(\operatorname{sgn}(X)=-1\text{ and }|X|\le x)}{\Pr(\operatorname{sgn}(X)=-1)} \\ &= \frac{\Pr(\operatorname{sgn}(X)=-1\text{ and }|X|\le x)}{1/2} \\ &= 2\Pr(-x \le X \lt 0)\end{aligned}$$
and an identical calculation based on $(**)$ reveals
$$\Pr(|X| \le x\mid \operatorname{sgn}(X) = 1) = 2\Pr(0 \lt X \le x).$$
But, as observed at the outset, the chances $(1)$ and $(2)$ are the same by symmetry. We have established that the conditional distribution of $|X|$ does not depend on the distribution of $\operatorname{sgn}(X)$ and that means these variables are independent.