As I recall algorithms like nearest neighbor don't build a model based on training data and then apply that model to test data. It just takes each new instance and compares it to all the data to find the closest one, etc.
What about Naive Bayes? It seems to be similar. For example neural networks learn parameters and then use the corresponding model on test data. But for Naive Bayes I don't see where the learning takes place. There are no learned parameters. It seems to again look at the entire dataset for each prediction. Can anyone comment on this?
Additionally, then what is the use of a training/test split. I can see that we would want a test set because we want it to be labeled but beyond that I don't see why we need test/train?
Best Answer
Different from the nearest neighbor algorithm, the Naive Bayes algorithm is not a lazy method; A real learning takes place for Naive Bayes. The parameters that are learned in Naive Bayes are the prior probabilities of different classes, as well as the likelihood of different features for each class. In the test phase, these learned parameters are used to estimate the probability of each class for the given sample.
In other words, in Naive Bayes, for each sample in the test set, the parameters determined during training are used to estimate the probability of that sample belonging to different classes. For example, $P(c|x)\propto P(c)P(x_1|c)P(x_2|c)...p(x_n|c)$ where $c$ is a class and $x$ is a test sample. All quantities $P(c)$ and $P(x_i|c)$ are parameters which are determined during training and are used during testing. This is similar to NN, but the kind of learning and the kind of applying the learned model is different.
As an example, take a look at the Naive Bayes implementation in nltk. See the
train
andprob_classify
methods. In thetrain
method,label_probdist
andfeature_probdist
are computed, and in theprob_classify
method, these parameters are used to estimate the probability of different class for a test sample. Just note that_label_probdist
and_feature_probdist
are respectively initialized tolabel_probdist
andfeature_probdist
in the constructor.About your second question (the final paragraph), even for the lazy methods such as the nearest neighbor method, we need to split data into train/test. This is because we want to evaluate the performance of the model obtained based on the training data on some samples that are not seen during training to obtain a reasonable measure of the model generalization.