Solved – Regression with very small sample size

regressionsmall-sample

I want to run a regression with 4 to 5 explanatory variables, but I have only 15 observations. Not being able to assume these variables are normally distributed, is there a non-parametric or any other valid regression method?

Best Answer

@Glen_b is right about the nature of the normality assumption in regression1.

I think your bigger problem is going to be that you don't have enough data to support 4 to 5 explanatory variables. The standard rule of thumb2 is that you should have at least 10 data per explanatory variable, i.e. 40 or 50 data in your case (and this is for ideal situations where there isn't any question about the assumptions). Because your model would not be completely saturated3 (you have more data than parameters to fit), you can get parameter (slope, etc.) estimates and under ideal circumstances the estimates are asymptotically unbiased. However, it is quite likely that your estimates will be a long way off from the true values and your SE's / CI's will be very large, so you will have no statistical power. Note that using a nonparametric, or other alternative, regression analysis will not get you out of this problem.

What you will need to do here is either pick a single explanatory variable (before looking at your data!) based on prior theories in your field or your hunches, or you should combine your explanatory variables. A reasonable strategy for the latter option is to run a principal components analysis (PCA) and use the first principle component as your explanatory variable.

References:
1. What if residuals are normally distributed but Y is not?
2. Rules of thumb for minimum sample size for multiple regression
3. Maximum number of independent variables that can be entered into a multiple regression equation

Related Question