Statistics – Lasso of Sparse Linear Regression Model

st.statistics

Consider the sparse linear regression model $y=X\theta^*+w$, where $w\sim N(0, \sigma^2 I_{n\times n})$ and $\theta^*\in R^d$ is supported on a subset $S$. Suppose that the sample covariance matrix $\hat{\Sigma}=X^TX/n$ has its diagonal entries uniformly upper bounded by 1, and that for some parameter $\gamma>0$, it also satisfies an $\ell_{\infty}$-curvature condition of the from
$$\|\hat{\Sigma}\Delta\|_\infty\ge \gamma\|\Delta\|_\infty, \, \Delta\in C_3(S)
$$
where $C_3(S):=\{\Delta\in R^d: \|\Delta_{S^c}\|_1\le 3 \|\Delta_{S}\|_1\}$

Best Answer

The proof you are looking for should be in

Theorem 1 of Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators by Karim Lounici (https://arxiv.org/abs/0801.4610).

Check the relationship of your curvature condition with Assumption 2 of that paper. The proof there should give you the main ideas, in particular the standard argument to prove that $\hat \theta-\theta^*\in C_3(S)$.

This proof fails for random design matrices when $|S|>>\sqrt n$ because the curvature condition in $\ell_\infty$ norm cannot hold. It is possible to overcome this difficult for Gaussian random design matrices, see

Theorem 5.1 in De-Biasing The Lasso With Degrees-of-Freedom Adjustment by Bellec and Zhang (https://arxiv.org/abs/1902.08885)

Best Answer

Related Solutions

[Math] Parametric vs Non-parametric Estimation of Quantiles

[Math] In linear regression, we have 0 training error if data dimension is high, but are there similar results for other supervised learning problems

Related Question