Solved – Understanding ridge regression results

ridge regression

I am new to ridge regression. When I applied linear ridge regression, I got the following results:

>myridge = lm.ridge(y ~ ma + sa + lka + cb  + ltb , temp, lamda = seq(0,0.1,0.001))
> select(myridge)
modified HKB estimator is 0.5010689 
modified L-W estimator is 0.3718668 
smallest value of GCV  at 0

Questions:

Is it OK to get zero for GCV?
What exactly does it mean?
Is there a problem with my model?
How can I find the $R^2$ value of myridge?

Best Answer

You might be better off with the penalized package or the glmnet package; both implement lasso or elastic net so combines properties of the lasso (feature selection) and ridge regression (handling collinear variables). penalized also does ridge. These two packages are far more fully featured than lm.ridge() in the MASS package for such things.

Anyway, $\lambda = 0$ implies zero penalty, hence the least squares estimates are optimal in the sense that they had the lowest GCV (generalised cross validation) score. However, you may not have allowed sufficiently large a penalty; in other words, the least squares estimates were optimal of the small set of of $\lambda$ values you looked at. Plot the ridge path (values of the coefficients as a function of $\lambda$ and see if the traces have stabilised or not. If not, increase the range of $\lambda$ values evaluated.

Related Solutions

Solved – How to interpret the results from ridge regression

Some things to look at when fitting the ridge regression

regression coefficients for this fit:

round(gridge$coef[, which(gridge$lambda ==.02)], 2)

ordinary least square fit:

round(gridge$coef[, which(gridge$lambda == 0)], 2)

The ridge regression centers and scales the predictors so you need to do the same when calculating the fit. You can add back the mean of the response.

more info on ridge regression: http://tamino.wordpress.com/2011/02/12/ridge-regression/

Solved – Variance Inflation Factor less than 1 in ridge regression

I would like to suggest that you calculate the diagonal elements of matrix directly.

It is assumed that the design matrix is centered and scaled.

We can adopt the eigen value decomposition $R_{XX}=X'X=T\Lambda T'$.

$\begin{align} (R_{XX}+cI)^{-1}R_{XX}(R_{XX}+cI)^{-1}&=(R_{XX}+cI)^{-1}(R_{XX}+cI)(R_{XX}+cI)^{-1}-c(R_{XX}+cI)^{-1}(R_{XX}+cI)^{-1}\\ &=(R_{XX}+cI)^{-1}-c(R_{XX}+cI)^{-1}(R_{XX}+cI)^{-1} \\ &=(T\Lambda T'+cTT')^{-1}-c(T\Lambda T'+cTT')^{-1}(T\Lambda T'+cTT')^{-1}\\ &=T\left( (\Lambda+cI)^{-1}-c (\Lambda+cI)^{-1} (\Lambda+cI)^{-1} \right)T' \end{align}$

The matrix $ (\Lambda+cI)^{-1}$ is a diagonal matrix that its $i$th element is $\frac{1}{\lambda_i+c}$.

So the matrix $(\Lambda+cI)^{-1}-c (\Lambda+cI)^{-1} (\Lambda+cI)^{-1}$ is also a diagonal matrix and its ith element is $\frac{\lambda_i}{(\lambda_i+c)^2}$.

In OLS, it is known that vif values are the diagonal elements of the matrix $T\Lambda^{-1}T'$. Comparing this $\Lambda^{-1}$ matrix with the corresponding of ridge$(\Lambda+cI)^{-1}-c (\Lambda+cI)^{-1} (\Lambda+cI)^{-1}$, every diagonal elements of the ridge case are deflated by the factor $\frac{\lambda_i^2}{(\lambda_i+c)^2}$.

I guess now we can conclude the bigger the ridge constant, we would get the more deflated VIFs.

I am not a native English speaker. Please don't mind my awkward expressions and it would be nice of you if you correct my grammar errors. Thank you.

Best Answer

Related Solutions

Solved – How to interpret the results from ridge regression

Solved – Variance Inflation Factor less than 1 in ridge regression

Related Question