The Spy EM algorithm solves exactly this problem.
S-EM is a text learning or classification system that learns from a set of positive and unlabeled examples (no negative examples). It is based on a "spy" technique, naive Bayes and EM algorithm.
The basic idea is to combine your positive set with a whole bunch of randomly crawled documents. You initially treat all the crawled documents as the negative class, and learn a naive bayes classifier on that set. Now some of those crawled documents will actually be positive, and you can conservatively relabel any documents that are scored higher than the lowest scoring true positive document. Then you iterate this process until it stablizes.
Let's say you've trained your Naive Bayes Classifier on 2 classes, "Ham" and "Spam" (i.e. it classifies emails). For the sake of simplicity, we'll assume prior probabilities to be 50/50.
Now let's say you have an email $(w_1, w_2,...,w_n)$ which your classifier rates very highly as "Ham", say $$P(Ham|w_1,w_2,...w_n) = .90$$ and $$P(Spam|w_1,w_2,..w_n) = .10$$
So far so good.
Now let's say you have another email $(w_1, w_2, ...,w_n,w_{n+1})$ which is exactly the same as the above email except that there's one word in it that isn't included in the vocabulary. Therefore, since this word's count is 0, $$P(Ham|w_{n+1}) = P(Spam|w_{n+1}) = 0$$
Suddenly, $$P(Ham|w_1,w_2,...w_n,w_{n+1}) = P(Ham|w_1,w_2,...w_n) * P(Ham|w_{n+1}) = 0$$ and $$P(Spam|w_1,w_2,..w_n,w_{n+1}) = P(Spam|w_1,w_2,...w_n) * P(Spam|w_{n+1}) = 0$$
Despite the 1st email being strongly classified in one class, this 2nd email may be classified differently because of that last word having a probability of zero.
Laplace smoothing solves this by giving the last word a small non-zero probability for both classes, so that the posterior probabilities don't suddenly drop to zero.
Best Answer
The problem you refer to is semi-supervised learning, of which active learning is a particular case.
Here one has a few labelled samples, and a huge bunch of unlabelled samples. The goal is to be able to exploit the knowledge of the sampled samples to be able to build a good classifier.
Let us take the case of classification. The basic idea is that, here one would like to maximize $p(C|x)$, where $C$ is some label representing class membership and $x$ is a sample. Now, from the data obtained from sampling $p(x)$ one attempts to inference something about $p(C|x)$. The idea to attack the problem is to introduce some assumptions linking $p(C|x)$ and the properties of $p(x)$. Those are the assumptions listed in the Wikipedia article.
You may also find these slides very informative.