If $\lvert X_{ij}\rvert \leq 1$ for all $i, j$, how do we show that the diagonal elements of $(X^T X)^{-1}$ are at least $\frac{1}{n}$

inequalityleast squareslinear algebramatricesstatistics

I am reformulating the question in what I hope will be a simpler, clearer way. However, I am also leaving the original formulation below.

Given an $n \times p$ matrix $X$, if $\lvert X_{ij}\rvert \leq 1$ for all $i, j$, how do we show that $(X^T X)^{-1}_{j,j} \geq \frac{1}{n}$ for all $j$?

Original formulation:

I have a linear model $Y = X\beta + \epsilon$ where $\epsilon \sim (0_n, \sigma^2 I_n)$. The matrix $X$ is $n \times p$. If $\hat \beta$ is the least squares estimator of $\beta$ and $\lvert x_{ij} \rvert \leq 1$ for all $i, j$, then I want to show that $\text{Var}(\hat \beta_j) \geq \sigma^2 / n$ for $j = 1, \dots, p$.

(I guess the point here is that matrices $X$ with small values lead to large variability in the least squares estimators for $\beta$.)

I know that $\hat \beta = (X^T X)^{-1} X^T Y$, and $\text{Var}(\hat \beta) = \sigma^2 (X^T X)^{-1}$. I was thinking that if I had a formula for the diagonal entry of $(X^T X)^{-1}$ that might be handy, but I'm not sure where I am going with that. I was also thinking about the triangle inequality, but I'm not sure how to use it. One consequence of the assumption $\lvert x_{ij} \rvert \leq 1$ is that all entries in $X^T X$ will have absolute value less than $n$, right?

I was also playing with a toy $3 \times 2$ example for X, but that didn't seem to lead anywhere.

I appreciate any help.

Edit: Response to Hyperplane's answer (NOTE: the answer was deleted)

I think it is clear that proving that $(X^T X)^{-1}_{j,j} \geq \frac{1}{n}$ for all $j$ will suffice, because multiplying both sides by $\sigma^2$ will give us the desired inequality. I see that you have argued that all the diagonal elements of $(X^T X)^{-1}$ are trapped between the smallest and largest eigenvalues of $(X^T X)^{-1}$.

In particular,

\begin{align*}
(X^T X)^{-1}_{j,j} \geq \frac{1}{\lambda_{\max}(X^T X)}
\end{align*}

because eigenvalues are inverted for an inverse matrix. Therefore, if we can show that

\begin{align*}
\frac{1}{\lambda_{\max}(X^T X)} &\geq \frac{1}{n}, \text{ or equivalently}\\
n &\geq \lambda_{\max}(X^T X)
\end{align*}

we'll essentially be done. (There is an assumption here that $\lambda_{\max}(X^T X) \neq 0$; I am not entirely sure how to argue this away.)

My main concern is in your second last line. Matrix norms are new to me, but according to Wikipedia, we should have $\lambda_{\max}(X^T X) = \Vert X \Vert_2^2$, which (unless I'm mistaken) means you've shown that $\lambda_{\max}(X^T X) \leq n^2$, which is not good enough.

I am hoping this is a simple misunderstanding on my part.

Best Answer

Note that for any positive definite matrix $A$ and unit vector $u$, we have $$ (u^\ast Au)(u^\ast A^{-1}u) =\|A^{1/2}u\|_2^2\|A^{-1/2}u\|_2^2 \ge|\langle A^{1/2}u,A^{-1/2}u\rangle|^2 =1. $$ In particular, when $A=X^TX$ and $u=e_k$, $$ (A^{-1})_{kk} =e_k^T A^{-1}e_k \ge\frac{1}{e_k^T Ae_k} =\frac{1}{\|Xe_k\|_2^2} =\frac{1}{\sum_{i=1}^nx_{ik}^2} \ge\frac1n. $$ Equalities hold if and only if $A^{1/2}e_k$ is parallel to $A^{-1/2}e_k$ and $\|Xe_k\|_2^2=n$. This means $e_k$ is a right singular vector of $X$ and $|x_{ik}|=1$ for every $i$. That is, the $k$-th column of $X$ is orthogonal to all other columns and all entries on the $k$-th column are equal to $\pm1$.

Related Question