Solved – R Forward and backward Selection

regression

I have a data set with large number of attributes some are not relevant and some are relevant for the regression model. My approach was to do forward and backward selection to identify a starting point, such as which attributes I should drop from the analysis.

I did a forward and backward without any log transformation for the attributes and issue is that the best model provided by forward selection and best model provided by backward selection are different. I am confuse on what I should pick to go forward with.

In addition when you have large number of attributes what is the best way to identify predictors for regression model.

Best Answer

In brief, forward and backward selection are unfortunately rather poor tools for feature selection. Frank Harrell is likely the most opinionated (and informed) opponent of the method. See some of his main comments here: (And buy his great regression strategy book!):

http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/

I believe LASSO/Elastic Net would be a much safer option if you still want to be constrained to a linear based model without interactions. The glmnet package in R is a very mature implementation that also has nice cross-validation functions to help you determine the "best/most useful" number of features.