Solved – Variance Inflation Factor less than 1 in ridge regression

multicollinearityregressionridge regressionself-studyvariance-inflation-factor

I was trying to determine the biasing constant in ridge regression when I came across a phenomenon that seems quite puzzling, to me at least. I let the GCV criterion choose a constant for me and then I got the Variance Inflation Factors of the new model by computing

$$ \left( \mathbf{R_{XX}} +c\mathbf{I} \right)^{-1} \mathbf{R_{XX}} \left( \mathbf{R_{XX}} +c\mathbf{I} \right)^{-1} $$

and extracting the diagonal elements of this matrix. What I found puzzling was the fact that these VIFs were very close to zero. It seems to me that that would require negative $R^2$s, no? I know that this can happen occasionally, for example in Regression Through the Origin, but I cannot quite justify it in this context.

I am wondering then, what does a VIF close to zero mean? Then, would my choice of this constant be acceptable or should I look for another solution that keeps the VIFs close to 1, as they ought to be in the absence of multicollinearity?

Best Answer

I would like to suggest that you calculate the diagonal elements of matrix directly.

It is assumed that the design matrix is centered and scaled.

We can adopt the eigen value decomposition $R_{XX}=X'X=T\Lambda T'$.

$\begin{align} (R_{XX}+cI)^{-1}R_{XX}(R_{XX}+cI)^{-1}&=(R_{XX}+cI)^{-1}(R_{XX}+cI)(R_{XX}+cI)^{-1}-c(R_{XX}+cI)^{-1}(R_{XX}+cI)^{-1}\\ &=(R_{XX}+cI)^{-1}-c(R_{XX}+cI)^{-1}(R_{XX}+cI)^{-1} \\ &=(T\Lambda T'+cTT')^{-1}-c(T\Lambda T'+cTT')^{-1}(T\Lambda T'+cTT')^{-1}\\ &=T\left( (\Lambda+cI)^{-1}-c (\Lambda+cI)^{-1} (\Lambda+cI)^{-1} \right)T' \end{align}$

The matrix $ (\Lambda+cI)^{-1}$ is a diagonal matrix that its $i$th element is $\frac{1}{\lambda_i+c}$.

So the matrix $(\Lambda+cI)^{-1}-c (\Lambda+cI)^{-1} (\Lambda+cI)^{-1}$ is also a diagonal matrix and its ith element is $\frac{\lambda_i}{(\lambda_i+c)^2}$.

In OLS, it is known that vif values are the diagonal elements of the matrix $T\Lambda^{-1}T'$. Comparing this $\Lambda^{-1}$ matrix with the corresponding of ridge$(\Lambda+cI)^{-1}-c (\Lambda+cI)^{-1} (\Lambda+cI)^{-1}$, every diagonal elements of the ridge case are deflated by the factor $\frac{\lambda_i^2}{(\lambda_i+c)^2}$.

I guess now we can conclude the bigger the ridge constant, we would get the more deflated VIFs.

I am not a native English speaker. Please don't mind my awkward expressions and it would be nice of you if you correct my grammar errors. Thank you.

Related Question