Statistics – Sparse Linear Regression Model

st.statistics

Define a set $C_3(S):=\{\Delta\in R^d: \|\Delta_{S^c}\|_1\le 3\|\Delta_S\|_1\}$. Suppose we form a random design matrix $X\in R^{n\times d}$ with rows drawn iid from a $N(0,\Sigma)$ distribution and that for all $\Delta \in C_3(S)$,
$$
\|\Sigma\Delta\|_{\infty}\ge \gamma\|\Delta\|_{\infty}
$$

Best Answer

The result is completely independent of the regression model ($y, w$ have no role and need not be involved). $\alpha$ in the definition of $C_3(S)$ is not defined, I will assume it is $\alpha=3$.

For each $\Delta\in C_3(S)$, there exists $k=1,...,p$ such that $$|e_k^TX^TX/n \Delta| - |e_k^T\Sigma\Delta| + |e_k^T\Sigma\Delta| \ge \gamma |\Delta|_\infty - |e_k^T(X^TX/n-\Sigma)\Delta|.$$ Now, using $|u^Tv|\le \|u\|_\infty\|v\|_1$, $$ |e_k^T(X^TX/n-\Sigma)\Delta| \le \max_{j=1,...,p} |e_k^T(X^TX/n -\Sigma)e_j| \|\Delta\|_1 \le \max_{j=1,...,p} |e_k^T(X^TX/n -\Sigma)e_j| ~ 6 \|\Delta_S\|_1 $$ and $\|\Delta_S\|_1 \le s\|\Delta\|_\infty$. By the Hanson-Wright inequality, for each $j,k$, assuming $\max_{i}\Sigma_{ii}\le 1$, $$ P(|e_k^T(X^TX/n -\Sigma)e_j| > c(\sqrt{x/n} + {x/n})) ) < e^{-x}. $$ By the union bound over all pairs $i,k$ and taking $x=\log(p^3)$, we obtain that $$ \max_{j=1,...,p} |e_k^T(X^TX/n -\Sigma)e_j| \le C \sqrt{\log (p)/n} $$ with probability at least $1/p$. This provides the desired bound when $s\sqrt{\log(p)/n}$ is smaller than a constant.

Related Question