Solved – Multiple Regression – Minimum Observations Per Dumthe Variable

categorical datamultiple regressionrule-of-thumbsmall-sample

I believe the rule of thumb is at least 10-20 observations per predictor variable, but I was hoping to get some additional clarification.

Suppose a hypothetical example with dependent variable of salary, and explanatory variables race (4 dummies), region (4 dummies), and years of education (continuous).

Counts

Region1: 2 black, 3 asian, 5 hispanic, 5 white

Region2: 2 black, 3 asian, 5 hispanic, 5 white

Region3: 3 black, 2 asian, 5 hispanic, 5 white

Region4: 3 black, 2 asian, 5 hispanic, 5 white

So, there are 10 observations for black, 10 for asian, 20 for hispanic, 20 for white, 15 within each region, and 60 for education.

Assuming the model is well specified, is it sufficient to have at least 10 observations for each of the race dummy variables, or should there be 10 within each region as well?

Also, in a similar vein, for a larger model with a predictor that has many dummy variables (such as job title) and it is not realistic to have a sufficient sample size for each dummy, is there some percentage of the total observations that should be in dummies with at least a count of 10?

Thanks so much.

Best Answer

The question is "minimum number of observations to do what?". If the objective is to find the min number of observations needed to detect a significant effect of a dummy (when the effect truly exists), then you need to know what might be the effect size and then perform standard power analysis. If you just want to know how many observations you need to run your model, then it is very likely that the model will run with a very small number of observations per variables ( say less than 10) as long as they are not too correlated.

Best Answer

Related Solutions

Solved – Chi-square using factors with multiple levels in R

Solved – Please help me choose variables for a multiple linear regression analysis

Solution 1: Remove the constant

Solution 2: Exclude 1 race and estimate effects relative to that race.

How are solution 1 and solution 2 linked?

Summary:

Related Question