Solved – Regularization in text classification with bag-of-words

bag of wordsclassificationlogisticmachine learningregularization

I am performing a text categorization with bag of words and logistic regression.
I have already heard about L1 and L2 regularization and used them for classification but with problems handling way less features, that were not about text.

Is it useful in the context of bag-of-words features?

Best Answer

It's very useful.

In text classification using bag of words you can routinely run into tasks where number of features is much bigger than number of examples. That means when you try to fit linear model, you'll have problems, since the corresponding linear system is underdetermined.

L2 regularization is often used to deal with underdetermined linear systems. L1 is used for this too, and it also has an advantage of enforcing sparsity, thus making model simpler to interpret (if you want to read more about that, you can read about maximum a posteriori estimation or bayesian linear models).

As an example you can see this page from scikit-learn documentation.