Solved – Is $X^T X$ invertible if $p > n$

linear algebramathematical-statisticsregression

I am checking other's regression analysis work on a $p > n$ data. I can only see the results but not the process how he did it.

I believe he made mistakes. Since $p > n$, $X^T X$ is not full rank it is not invertible. So we cannot find the OLS coefficient.

However, I am not sure my reasoning is correct. I have not used linear algebra for a long time.

Update:

  1. No penalized methods involved.
  2. He used stepwise regression for variable selection. A new question: would such algorithm stop if number of variables in the model equal to the number of sample points?
  3. His goal is to find out which variables are important. Doesn't care about the prediction power.

Best Answer

If matrix $\mathbf X$ is $n \times p$ and $p > n$, then it is fat and, thus,

$$ \mbox{rank} (\mathbf X) = \mbox{rank} \left(\mathbf X^\top \mathbf X \right) \leq n < p $$

Hence, $\mathbf X^\top \mathbf X$ does not have full rank and, thus, it is not invertible.

Related Question