Solved – fit logistic regression over a dataset with only categorical data

categorical datafittinggeneralized linear modellogisticregression

I have a dataset which contains only categorical data i.e.A,B,C,D (like factors) for each predictor. There are 10 predictors and the dependent variable is binary, 0,1.

UPDATE: MY predictors are answers for multiple choice questions for a questionnaire. So each predictor only takes on categorical values, i.e. X_1 can be A,B,C or D, X_2 can be A,B,C,D,E,F,G or H.

Is it feasible to fit a logistic regression over this dataset?
Ideally, if I can fit a logistic regression the data, I will then use it for prediction over a set of test data, which again contains only categorical data.

What are the pitfalls that I should look out for?

Best Answer

Yes of course you can. Just be aware of the nature of your categorical data - is it ordered or unordered?

If ordered (e.g. small, medium, large) you might want a single feature X1 with values like (1, 1, 3, 2, 3, 1, ...) where 1 represents small, 2 represents medium, etc.

If unordered (e.g. red, blue, green) you'll want multiple features like X1 = (0, 0, 1, 0) representing "is red?", X2 = (1, 0, 0, 1) representing "is blue?" and so forth.

Related Question