Solved – Multiple regression with small data sets

regressionsmall-sample

I have a dataset of project case studies for a new type of research method for Government agencies to support decision making activities. My task is to develop an estimation method based on past experience for future projects for estimation purposes.

My dataset is limited to 50 cases. I have 30+ (potential) predictors recorded and one response variable (i.e. hours taken to complete the project).

Not all predictors are significant, using step-wise selection techniques I'm expecting number of prediction variables is likely to be in the 5-10 variable range. Although I'm struggling to get a predictor set using the standard appraoches in tools like PASW (SPSS).

I'm well aware of all the material talking about rules of thumb for sample sizes and predictor variable to case ratios. My dilemma is that it's taken close to 10 years to collect 50 cases as it is, so it's about as good as it will get.

My question is what should I do to get the most out of this small sample set?

That is any good references for dealing with small smaple sets? Changes in p-value significance? Changes to step-wise selection approaches? Use of transforms such as centre-ing or log?

Any advice is appreciated.

Best Answer

As you want to select a few predictors from your data set, I would suggest a simple linear regression with $L_1$ penalty or using the LASSO (penalized linear regression). Your case is suited for regression with LASSO penalty as your sample size, $n = 50$, and the number of predictors, $p=30$. Changing the tuning parameter will select the number of predictors you want to choose.

If you can give details about the distribution of your variables, I can be more specific.

I don't use SPSS, but this can be done easily in R using the glmnet function in the package of the same name. If you look in the manual, it contains a generic example (very first one, for gaussian case) which will solve your problem. I am sure, similar solution must exist in SPSS.

Related Question