Solved – classification of multiple outcomes with categorical and continuous predictors

classificationmodelmulti-classprediction

I have a dataset which consists of mainly categorical predictors (n=7) and also some continuous ones (n = 3) and a binary outcome variable. I have previously made a classification model using logistic regression. This model performs well with an AUC on a testing set of 67%. I have the option of breaking down the binary outcome into multiple outcome categories and I have reason the believe that this will increase prediction accuracy. This final model will also have to be implemented in excel (though I build my models in R). From reading Introduction to Statistical Analysis by Trevor Hastie, and from this article: https://help.xlstat.com/customer/en/portal/articles/2062225-discriminant-analysis-in-excel-tutorial?b_id=9283 it appears that Linear Discriminant Analysis is the clear choice. However, from this post: Can we use categorical independent variable in discriminant analysis? it appears that LDA doesn't handle categorical input variables. The best multi-class classification algorithm would probably by a neural network, however that would be too difficult/impossible to implement in excel. What approach can I take to build a multiclass classification model based on categorical and continuous predictors?
Thanks

Best Answer

I think categorical variables can be handled in LDA as well. You can do label encoding or one hot encoding for the categorical variables and then you're good. You can also try SVM for multi class classification problem if your dataset is not that big. Ps: I won't say that AUC of 0.67 on testing set implies that model performed well. Also, Neural network is generally preferred when you want to find complex patterns and a large amount of data is available.

Related Question