The NaiveBayes()
function in the klaR package obeys the classical formula
R interface whereby you express your outcome as a function of its predictors, e.g. spam ~ x1+x2+x3
. If your data are stored in a data.frame
, you can input all predictors in the rhs of the formula using dot notation: spam ~ ., data=df
means "spam
as a function of all other variables present in the data.frame
called df
."
Here is a toy example, using the spam
dataset discussed in the Elements of Statistical Learning (Hastie et al., Springer 2009, 2nd ed.), available on-line. This really is to get you started with the use of the R function, not the methodological aspects for using NB classifier.
data(spam, package="ElemStatLearn")
library(klaR)
# set up a training sample
train.ind <- sample(1:nrow(spam), ceiling(nrow(spam)*2/3), replace=FALSE)
# apply NB classifier
nb.res <- NaiveBayes(spam ~ ., data=spam[train.ind,])
# show the results
opar <- par(mfrow=c(2,4))
plot(nb.res)
par(opar)
# predict on holdout units
nb.pred <- predict(nb.res, spam[-train.ind,])
# raw accuracy
confusion.mat <- table(nb.pred$class, spam[-train.ind,"spam"])
sum(diag(confusion.mat))/sum(confusion.mat)
A recommended add-on package for such ML task is the caret package. It offers a lot of useful tools for preprocessing data, handling training/test samples, running different classifiers on the same data, and summarizing the results. It is available from CRAN and has a lot of vignettes that describe common tasks.
In general the naive Bayes classifier is not linear, but if the likelihood factors $p(x_i \mid c)$ are from exponential families, the naive Bayes classifier corresponds to a linear classifier in a particular feature space. Here is how to see this.
You can write any naive Bayes classifier as*
$$p(c = 1 \mid \mathbf{x}) = \sigma\left( \sum_i \log \frac{p(x_i \mid c = 1)}{p(x_i \mid c = 0)} + \log \frac{p(c = 1)}{p(c = 0)} \right),$$
where $\sigma$ is the logistic function. If $p(x_i \mid c)$ is from an exponential family, we can write it as
$$p(x_i \mid c) = h_i(x_i)\exp\left(\mathbf{u}_{ic}^\top \phi_i(x_i) - A_i(\mathbf{u}_{ic})\right),$$
and hence
$$p(c = 1 \mid \mathbf{x}) = \sigma\left( \sum_i \mathbf{w}_i^\top \phi_i(x_i) + b \right),$$
where
\begin{align}
\mathbf{w}_i &= \mathbf{u}_{i1} - \mathbf{u}_{i0}, \\
b &= \log \frac{p(c = 1)}{p(c = 0)} - \sum_i \left( A_i(\mathbf{u}_{i1}) - A_i(\mathbf{u}_{i0}) \right).
\end{align}
Note that this is similar to logistic regression – a linear classifier – in the feature space defined by the $\phi_i$. For more than two classes, we analogously get multinomial logistic (or softmax) regression.
If $p(x_i \mid c)$ is Gaussian, then $\phi_i(x_i) = (x_i, x_i^2)$ and we should have
\begin{align}
w_{i1} &= \sigma_1^{-2}\mu_1 - \sigma_0^{-2}\mu_0, \\
w_{i2} &= 2\sigma_0^{-2} - 2\sigma_1^{-2}, \\
b_i &= \log \sigma_0 - \log \sigma_1,
\end{align}
assuming $p(c = 1) = p(c = 0) = \frac{1}{2}$.
*Here is how to derive this result:
\begin{align}
p(c = 1 \mid \mathbf{x})
&= \frac{p(\mathbf{x} \mid c = 1) p(c = 1)}{p(\mathbf{x} \mid c = 1) p(c = 1) + p(\mathbf{x} \mid c = 0) p(c = 0)} \\
&= \frac{1}{1 + \frac{p(\mathbf{x} \mid c = 0) p(c = 0)}{p(\mathbf{x} \mid c = 1) p(c = 1)}} \\
&= \frac{1}{1 + \exp\left( -\log\frac{p(\mathbf{x} \mid c = 1) p(c = 1)}{p(\mathbf{x} \mid c = 0) p(c = 0)} \right)} \\
&= \sigma\left( \sum_i \log \frac{p(x_i \mid c = 1)}{p(x_i \mid c = 0)} + \log \frac{p(c = 1)}{p(c = 0)} \right)
\end{align}
Best Answer
This seems a bit ambiguous... What's wrong with
model$table
?