Solved – Train Classifier on Text AND Categorical AND Numerical data

ensemble learningfeature selectionnatural languageprediction

I'm building a Sentiment Prediction model for Tweets that runs on Text, Numerical and Categorical data. I have already two classifiers, one text and one non-text (for numerical and categorical). Now the problem is when I want to combine both classifiers into one. Since my training features aren't the same for both classifiers (Raw tweets for the Text Classifier, and other data like number of followers, presence of hashtag, presence of tags etc for the other classifier).

Basically was wondering if there would be any function in Scikit Learn that would allow to make it work?

Best Answer

Instead of combining different classifiers trained on disjoint subsets of features you could use vowpal wabbit which supports numerical, categorical and text features (via hashing trick).

If I understand correctly, training on disjoint subsets of features means you couldn’t capture some interactions, which might be important.

Related Question