Classification – How is Naive Bayes Algorithm a Linear Classifier

classificationnaive bayes

I've seen the other thread here but I don't think the answer satisfied the actual question. What I have continually read is that Naive Bayes is a linear classifier (ex: here) (such that it draws a linear decision boundary) using the log odds demonstration.

However, I simulated two Gaussian clouds and fitted a decision boundary and got the results as such (library e1071 in r, using naiveBayes())
1- Green, 0 - Red

As we can see, the decision boundary is non-linear. Is it trying to say that the parameters (conditional probabilities) are a linear combination in the log space rather than saying the classifier itself separates data linearly?

Best Answer

In general the naive Bayes classifier is not linear, but if the likelihood factors $p(x_i \mid c)$ are from exponential families, the naive Bayes classifier corresponds to a linear classifier in a particular feature space. Here is how to see this.

You can write any naive Bayes classifier as*

$$p(c = 1 \mid \mathbf{x}) = \sigma\left( \sum_i \log \frac{p(x_i \mid c = 1)}{p(x_i \mid c = 0)} + \log \frac{p(c = 1)}{p(c = 0)} \right),$$

where $\sigma$ is the logistic function. If $p(x_i \mid c)$ is from an exponential family, we can write it as

$$p(x_i \mid c) = h_i(x_i)\exp\left(\mathbf{u}_{ic}^\top \phi_i(x_i) - A_i(\mathbf{u}_{ic})\right),$$

and hence

$$p(c = 1 \mid \mathbf{x}) = \sigma\left( \sum_i \mathbf{w}_i^\top \phi_i(x_i) + b \right),$$

where

\begin{align} \mathbf{w}_i &= \mathbf{u}_{i1} - \mathbf{u}_{i0}, \\ b &= \log \frac{p(c = 1)}{p(c = 0)} - \sum_i \left( A_i(\mathbf{u}_{i1}) - A_i(\mathbf{u}_{i0}) \right). \end{align}

Note that this is similar to logistic regression – a linear classifier – in the feature space defined by the $\phi_i$. For more than two classes, we analogously get multinomial logistic (or softmax) regression.

If $p(x_i \mid c)$ is Gaussian, then $\phi_i(x_i) = (x_i, x_i^2)$ and we should have \begin{align} w_{i1} &= \sigma_1^{-2}\mu_1 - \sigma_0^{-2}\mu_0, \\ w_{i2} &= 2\sigma_0^{-2} - 2\sigma_1^{-2}, \\ b_i &= \log \sigma_0 - \log \sigma_1, \end{align}

assuming $p(c = 1) = p(c = 0) = \frac{1}{2}$.

*Here is how to derive this result:

\begin{align} p(c = 1 \mid \mathbf{x}) &= \frac{p(\mathbf{x} \mid c = 1) p(c = 1)}{p(\mathbf{x} \mid c = 1) p(c = 1) + p(\mathbf{x} \mid c = 0) p(c = 0)} \\ &= \frac{1}{1 + \frac{p(\mathbf{x} \mid c = 0) p(c = 0)}{p(\mathbf{x} \mid c = 1) p(c = 1)}} \\ &= \frac{1}{1 + \exp\left( -\log\frac{p(\mathbf{x} \mid c = 1) p(c = 1)}{p(\mathbf{x} \mid c = 0) p(c = 0)} \right)} \\ &= \sigma\left( \sum_i \log \frac{p(x_i \mid c = 1)}{p(x_i \mid c = 0)} + \log \frac{p(c = 1)}{p(c = 0)} \right) \end{align}

Best Answer

Related Solutions

Solved – Document classification with naive Bayes algorithm

Solved – Predicting continuous variables from text features

Related Question