Solved – ny difference between distant supervision, self-training, self-supervised learning, and weak supervision

machine learningsemi-supervised-learningterminologyunsupervised learning

From what I have read:


Distant supervision:

A Distant supervision algorithm usually has the following steps: 
1] It may have some labeled training data 
2] It "has" access to a pool of unlabeled data 
3] It has an operator that allows it to sample from this unlabeled 
   data and label them and this operator is expected to be noisy in its labels 
4] The algorithm then collectively utilizes the original labeled training data
    if it had and this new noisily labeled data to give the final output.

Self-training:

enter image description here


Self-learning (Yates, Alexander, et al. "Textrunner: open information extraction on the web." Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2007.):

The Learner operates in two steps. First, it automatically labels its
own training data as positive or negative. Second, it uses this
labeled data to train a Naive Bayes classifier.


Weak Supervision (Hoffmann, Raphael, et al. "Knowledge-based weak supervision for information extraction of overlapping relations." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.):

A more promising approach, often called “weak” or “distant”
supervision, creates its own training data by heuristically matching
the contents of a database to corresponding text.


It all sounds the same to me, with the exception that self-training seems to be slightly different in that the labeling heuristic is the trained classifier, and there is a loop between the labeling phase and the classifier training phase. However, Yao, Limin, Sebastian Riedel, and Andrew McCallum. "Collective cross-document relation extraction without labelled data." Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010. claim that distant supervision == self training == weak supervision.

Also, are there other synonyms?

Best Answer

There are two aspects to all the different terms you have given: 1] Process of obtaining training data 2] Algorithm that trains $f$ or the classifier

The algorithm that trains $f$, regardless of how the training data is obtained is supervised. The difference in distant supervision, self-learning, self-supervised or weak supervision, lie purely then in how the training data is obtained.

Traditionally, in any machine learning paper on supervised learning, one would find that the paper implicitly assumes that the training data is available and for what its worth, it is usually assumed that the labels are precise, and that there is no ambiguity in the labels that are given to the instances in the training data. However, with distant/weak supervision papers, people realized that their training data has imprecise labels and what they want to usually highlight in their work is that they obtain good results despite the obvious drawback of using imprecise labels (and they may have other algorithmic ways to overcome the issue of imprecise labels, by having additional filtering process etc. and usually the papers would like to highlight that these additional processes are important and useful). This gave rise to the terms "weak" or "distant" to indicate that the labels on the training data are imprecise. Note that this does not necessarily impact the learning aspect of the classifier. The classifier that these guys use still implicitly assumes that the labels are precise and the training algorithm is hardly ever changed.

Self-training on the other hand is somewhat special in that sense. As you have already observed, it obtains its labels from its own classifier and has a bit of a feedback loop for correction. Generally, we study supervised classifiers under a slightly large purview of "inductive" algorithms, where the classifier learnt is an inductive inference made from the training data about the entire data. People have studied another form, which we call as transductive inference, where a general inductive inference is not the output of the algorithm, but the algorithm collectively takes both training data and test data as input and produces labels on the test data. However, people figured why not use transductive inference within inductive learning to obtain a classifier with larger training data. This is simply referred to as induction with unlabeled data [1] and self-training comes under that.

Hopefully, I have not further confused you, feel free to comment and ask for more clarifications if necessary.

[1] Might be useful - http://www.is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/pdf2527.pdf

Related Question