Solved – Too many dumthe variables

econometricsinteractionmany-categoriesmultiple regressionregression

If I'm doing a regression analysis and in my data I want to use quite a few categorical variables (for example region, educational level and political party they'd vote for), is a dummy variable approach the best solution?

A simple problem, brought up by Wooldridge (regarding wages and a dummy for married and a dummy for gender) is that there should be some interaction as the "marriage premium" isn't constant for males and females.

So I was thinking that I'd have to interact my dummy variables as well, which would create a lot of cross dummy terms (losing degrees of freedom, etc). If I had 3 categories and each had 6 options there'd be 6*6*6 interaction dummies?

Best Answer

This sounds like it might be an appropriate situation for multilevel modeling. How many different regions do you have? If there are many (say, dozens or more) you might wish to take such an approach (c.f. Duncan et al., 1998)

On the other hand, educational attainment can be incorporated as a numerical predictor quite successfully, although I always explore its functional relationship with the outcome by (1) using a nonparametric smoothing regression (Beck, 1997; Hastie and Tibshirani, 1987) in order to inform (2) specify a nonlinear (in all likelihood) functional form, usually with nonlinear least squares regression (Davidson, 2004).

If there are relatively few political parties, you might wish to retain the indicator variables for these categories.

References

Beck, N. and Jackman, S. (1997). Getting the mean right is a good thing: gen- eralized additive models. Working paper, Society for Political Methodology.

Davidson, R. and MacKinnon, J. G. (2004). Econometric Theory and Methods, chapter 6: Nonlinear Regression. New York: Oxford University Press.

Duncan, C., Jones, K., and Moon, G. (1998). Context, composition and heterogeneity: Using multilevel models in health research. Social Science & Medicine, 46(1):97–117.

Hastie, T. and Tibshirani, R. (1987). Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82(398):371–386.

Related Question