MATLAB: Feature reduction via regression analysis

regressionsequential feature reduction

Suppose you have a very large feature vector X, used to predict a a vector of expected values y.

Is the sequential linear linear regression,

e.g.:   coeff=regress(y, X);

followed by sequential feature reduction,

e.g.   [coeff_subset] = sequentialfs(fun, X, y, 'direction', 'backward'); 
  % where: fun = @(XT,yT,Xt,yt)(rmse(regress(yT, XT)'*Xt')', yt);

the easiest/best approach to get the a reasonable sized feature vecture when no other information is known?

It seems that, from my testing, this method rarely captures the features that matter the most, and I obtained better results by randomly selecting some of the features.

Best Answer

If you prefer linear regression, use function stepwisefit or its new incarnation LinearModel.stepwise. For example, for backward elimination with an intercept term you can do

load carsmall
X = [Acceleration Cylinders Displacement Horsepower];
y = MPG;
stepwisefit([ones(100,1) X],y,'inmodel',true(1,5))

In general, there is no "best" approach to feature selection. What you can do depends on what assumptions you are willing to make (such as linear model), how many features you have and how much effort you want to invest.

Related Solutions

MATLAB: Predictions for stepwise regression model

No predict function is defined for stepwisefit.

One is defined for stepwiselm:

m = stepwiselm(matrix , Y)
prediction1 = predict(m,matrix);

See if that does what you want.

MATLAB: How to use ‘regstats’ to make a multiple linear regression with more than three predictor variables

You've asked for a full quadratic model in 5 (not 4) predictor variables. That's a total of 5 quadratic + 5 linear + 9 (4+3+2) interaction terms + 1 constant term --> 20 coefficients to estimate with only 12 observations.

ERRATUM

Actually, there are 21 coefficients, not 20...I knew something seemed peculiar but didn't catch it at the time.

...+ 10 (4+3+2+1) interaction terms ...

I left off the last cross term...now back to our regularly scheduled programming...dpb

With three predictors it's "only" 3+3+3+1 = 10 which is still way overfitting the data with only 2 DOF left but at least is computable.

ADDENDUM

You could start with a simple linear model of the five predictors and see what happens...of course, model fitting without first plotting to see what the data look like is like playing darts blindfolded.

And, with only 12 data points in 5-vector space, it's going to be very difficult to do very much, anyway; you just don't have enough points to cover the dimensionality.

Best Answer

Related Solutions

MATLAB: Predictions for stepwise regression model

MATLAB: How to use ‘regstats’ to make a multiple linear regression with more than three predictor variables

Related Question