Solved – Classification using categorical and text data

classificationensemble learningtext mining

I have dataset that has numeric, categorical, Continuous and Text data. I am using one classification model for numeric, categorical, Continuous and another for Text data. I get probabilities in both the cases. Now how do I combine these two models? Is this a valid approach given the two models use different parts of the training data?

Found a similar question here, but with no answers – Train Classifier on Text AND Categorical AND Numerical data

Best Answer

Is this a valid approach given the two models use different parts of the training data?

Training two classifiers on disjoint subsets of features you’ll not be able to capture the interaction between features belonging to different subsets, e.g. text and numerical ones.

vowpal wabbit can handle features of various types incl. numerical, categorical and text.

Edit: Just to make sure we are on the same page regarding the interactions in the context of your question.

Imagine the case depicted below, i.e. two classes are not perfectly separable in single dimension, but can be easily separated if you classifier consider both dimensions simultaneously.

You might have similar situation - the classes cannot be clearly separated by classifiers focusing only on a subset of features.

Source of figure: wikipedia

Example of classes non-separable in one dimension

Related Question