Solved – Multiple Linear Regression – categorical variables

multiple regression

I have a categorical independent variable along with other continuous independent variables. Should I dummy code it or should I just treat it as a nominal variable?

Best Answer

I assume you have more than two categories, because if there are only two categories, it's not an issue.

That depends on how your statistics software handles categorical variables. In R, they are called factors, and if you include a factor in a regression model, it will automatically be dummy coded. However, if the categorical variable is not a factor, but a numerical variable, R will handle it as such, and you will need to specify it as a factor: factor(variable) to use it as a categorical variable (and R will create the dummy variables for you).

In SPSS, which is the other statistics software that I'm familiar with, nominal variables will be treated as continuous unless you specify that they are categorical via the "categorical" button in the regression dialog box.

In neither R nor SPSS you need to create the dummy variables yourself, and I imagine it's the same for most other statistics software today. So in my mind, there is no difference between dummy coding the variable and treat it as a nominal variable, because it's the same thing.

Related Question