Hanson-wright inequality vs subexponential bound

inequalityprobability

The Hanson-wright inequality is given by: for all $t>0$:

$$P(X^\intercal A X – E[X^\intercal A X ] \geq t) \leq \exp \Big[ -c \min \left(\frac{t^2}{K^4 \|A\|_F^2}, \frac{t}{K^2 \|A\|_2}\right)\Big]$$

where $X$ is a vector of independent subgaussian random variables with Orlicz norm $K$, $c$ is a universal positive constant and $A$ is a square matrix.

There are many proofs of this inequality, but one involves a decoupling argument: https://arxiv.org/pdf/1306.2872.pdf which I would argue is relatively complex.

Here is my proof of a similar bound:

$$X^\intercal A X – E[X^\intercal A X ] = \sum_i a_{ii}(X_i^2 – EX_i^2) + \sum_{i\neq j} a_{ij}X_iX_j$$

The first sum is subexponential and is easy to bound via a chernoff bound:

$$\mathbb P\left (\sum_i a_{ii}(X_i^2 – EX_i^2) \geq t\right) \leq \exp\left(-c \min \left\{\frac{t^2}{K^4\|A_{\text{diag}}\|^2_F}, \frac{t}{K^2\|A_{\text{diag}}\|_\infty}\right\}\right)$$

where $A_{\text{diag}}$ is equal to $A$ along the main diagonal, but $0$ in all other entries.

Similarly, the second sum can be shown to be subexponential with parameters $\sum_{i\neq j}4a_{ij}^2K^4$ and $\min_{i\neq j}1/{\sqrt 2 a_{ij}K^2}$ so the Chernoff bound gives us:

$$\mathbb P \left ( \sum_{i\neq j} a_{ij} (X_iX_j – \mathbb EX_iX_j) \right )
\leq \exp\left (- \min \left \{ \frac{t^2}{8\|A-A_{\text{diag}}\|^2_FK^4}, \frac{t}{2\sqrt 2 \|A-A_{\text{diag}}\|_\infty K^2}\right \} \right )$$

Putting these two bounds together we can achieve the same rate as in the Hanson-wright inequality.

My question: why is such a technical method taken to prove the Hanson-wright inequality if a more standard approach achieves the same rates? Is it really just for the constant factor of $\|A\|_2$ sometimes being better than the constant factor $\|A\|_\infty$? Otherwise, the two bounds are the same.

EDIT: to clarify a point in the comments below:

For two subexponential variables $X_1, X_2$ with parameters $(\nu_1, \alpha_1), (\nu_2, \alpha_2)$ the sum is subexponential with parameter $(\sqrt{2(\nu_1^2 + \nu_2^2)}, \max \{\alpha_1, \alpha_2\})$ because:

$$\mathbb E e^{\lambda (X_1 + X_2)} \leq \sqrt{\mathbb E e^{2\lambda X_1}\mathbb E e^{2\lambda X_1}} = e^{2\lambda^2 (\nu_1^2 + \nu_2^2)/2}$$

where the inequality follows by Cauchy schwarz, even for dependent rvs.

Best Answer

The reason for decoupling is because $X_iX_j$'s in the second sum are not independent. Therefore, instead of $\sqrt{\sum_{i\neq j}a_{ij}^2}K^2$, the parameter is bounded by $\sqrt{(\sum_{i\neq j}a_{ij})^2}K^2$. It is easy to see how different these are by taking $a_{ij}=a$(constant). While the former is about $naK^2$, the latter is about $n^2aK^2$. Thus you can see the latter cannot be bounded by the former with an absolute constant.

Related Question