Solved – Coding of categorical variables in logistic regression

categorical datacategorical-encodinglogisticregression

I have to do a binary logistic regression. I have a set of 7 independent variables. 4 of them are binary variables and the other 3 are categorical variables. The categorical variables are divided into 4 levels. For example, mode of payment: cash, internet banking, debit card, credit card.

I am confused whether I should make 4 columns with 0-1 dummies or just to code them 1, 2, 3, 4. What would be the right method for these variables

Best Answer

Yes, you need to separate the categories into 0/1 variables, omitting one of them. In R, this would be done with as.factor(paymentmode). In Stata, it is done with i.paymentmode (which may have to be prefixed by xi: in older versions). Some people believe in different coding schemes for these categorical variables, but really it is just a matter of how you are going to read your output, and has no effect on estimation procedure itself.