How to Handle Continuous and Discrete Features in a Naive Bayes Classifier

naive bayes

My task is to use Naive Bayes classifier for prediction, where I have both continuous and discrete variables as predictor variables. In literature the classifier is written as:

$$\hat{y}= \underset{k\;\in\;\{1,..K\}} {\mathrm{argmax}} \;\;P(C_k)\prod_{i=1}^np\left(x_i\;|\;C_k\right),$$

where $P$ is a probability and $p$ is a probability density function. What if some of my features $x_i$ are discrete variables and some are continuous? Does the decision rule just become into:

$$\hat{y}= \underset{k\;\in\;\{1,..K\}} {\mathrm{argmax}} \;\;P(C_k)\prod_{i=1}^mp\left(x_i\;|\;C_k\right)\prod_{j=m}^nP\left(x_j\;|\;C_k\right)\;\;?$$

Best Answer

That's exactly what it becomes!

If you're curious enough to want to see a mathematical justification of why this works, check out this generalized Naive Bayes derivation. To summarize, it all comes down to integral approximations. To get the probability of a specific variable value from the variable's continuous probability density function (PDF), you integrate the PDF around the value in question over an interval of width epsilon, and take the limit of that integral as epsilon approaches 0. For small epsilon, this integral will be equivalent to the product of epsilon and the height of the PDF at the variable value in question. Ordinarily, the limit of this expression would be to 0 as epsilon approached 0. However, in the case of Naive Bayes we are interested in the ratio of conditional probabilities. Because both the numerator and denominator of our ratio will include a factor of epsilon, these factors of epsilon cancel out. As a result, the limit of the ratio of conditional probabilities will be equivalent to the ratio of the PDF heights at the variable value in question.

Related Question