Solved – Naive Bayes Python implementation differences

naive bayespython

Am currently using Naive Bayes for a multi labelled text document classification problem.

But I would like to know the differences (advantages and disadvantages) of using SkiLearn Naive Bayes or NLTK?

SkiLearn seems to have flexibility in parameter setting more than NLTK, but I'm think my thoughts on this being the only difference is quite naive.

Also what cases would you select one over the other?

Any help appreciated.

Best Answer

Scikit learn has several Naive bayes. Naive bayes is usually a quick and dirty way to do classification.

The different ones used are:

Gaussian Naive Bayes: which normally used

Bernoulli Naive Bayes: used for things with 2 variables (heads or tails, yes or no)

Multinomial Naive Bayes: Usually used for text processing, where you have a smoothing parameter for missing data.

Here is a good explaination of the Naive bayes with +1 smoothing https://www.youtube.com/watch?v=0hxaqDbdIeE

Not sure about NLTK but sklearn is optomized based on this paper:http://www.cs.unb.ca/profs/hzhang/publications/FLAIRS04ZhangH.pdf

Also sklearn's algorithms are further optimized using cython.