Solved – Could applying ridge regression on small dataset improve predictive power

least squarespredictive-modelsridge regression

Suppose I have a small data set $X$ that is $30\times6$. I am wondering if it makes sense to use ridge regression if I want to improve the predicting power of the model.

To my understanding, ridge regression can usually be used to solve the following problems:

  1. The data matrix $X$ is singular. In this case, OLS does not work.
  2. We have too many features. Ridge regression's objective function puts penalty on size of features, meaning $\sum \|\beta_i\|^2$.

However, I am wondering if the following reasoning makes sense:

OLS estimators are unbiased and have the least variance among all unbiased estimators. Since I have a very small data set, my OLS estimators have very big variance. Even though they are unbiased, since I do not have a big data set, the predicting power might still be low. By using ridge regression, I no longer have unbiased estimators, but a high value of $\lambda$ will give me estimators that have lower variance. As a result, it is possible that I end up with a model that has better predicting power.

Best Answer

Your reasoning does make sense.

Given a correctly specified linear model, the optimal amount of $L_2$ penalty is positive, for any given sample size. That is, there exists a positive $\lambda$ such that ridge will do better than OLS in terms of mean squared error of estimated parameters; see Dave Giles blog post "A Regression "Estimator" that Minimizes MSE" for details. If you are able to estimate $\lambda$ precisely enough, ridge will yield more precise estimates than OLS, which in turn will improve prediction accuracy.

Without the assumption of a correctly specified model, things are less straightforward, but ridge may still be a good alternative. You may test the performance of ridge vs. OLS on a test sample and see for yourself which one does better.