Solved – Why is the categorical variable split up into separate variables in the regression model in r

lmr

I'm creating a multivariate model in R right now.

When I plot categorical variables into the lm() function and check the summary() output, my categorical variable gets split up into a beta coefficient for each option inside of the variable.

When I checked the data type it came back as a factor variable,
here is the output of summary to visualize the issue easier.

Best Answer

In a nutshell, since Regression analysis requires numerical variables; when a regression is performed with a categorical variable in a regression model, each category of the variable is transformed into a separate variable aka 'dummy' variable. When a dummy variable is 1, it means a categorical variable belongs to that category represented by this dummy variable.

This link here explains nicely how regression with categorical variables is performed in R :

http://www.sthda.com/english/articles/40-regression-analysis/163-regression-with-categorical-variables-dummy-coding-essentials-in-r/

Related Question