Solved – Popular named entity resolution software

machine learningnatural languagerecord-linkagetext mining

I am working on a project and need to extract persons' names from a large amount of documents. This task should belong to the named entity resolution problem. What are currently some of the most popular open source software/libraries to perform the named entity resolution?

Best Answer

The problem of named entity resolution is referred to as multiple terms, including deduplication and record linkage. I doubt that it is possible to determine precisely, what software belong to some of the most popular for solving that problem. There are various approaches and algorithms can be used for named entity resolution. Therefore, software which implements those can be seen as complementary to each other (perhaps, there exist multiple research studies that compare and benchmark entity resolution approaches and algorithms, but so far I have seen only two of them - see references below, denoted with a triple asterisk "***").

This nice tutorial (in a form of presentation slides) on entity resolution provides a comprehensive overview of the problem and the solutions, including both approaches and algorithms. The tutorial also provides an extensive set of references to sources with further information. Speaking about corresponding software, one may find open source or dual-license projects, such as Java-based Stanford NLP Group software (which includes Stanford named entity recognizer (NER)), Stanford Entity Resolution Framework (SERF), LingPipe (which includes a NER module) and Duke library, as well as Python-based NLTK software (http://www.nltk.org/book/ch07.html). I realize that named entity recognition and resolution are quite different tasks, however, some of the above-referenced software, focused on the former, might be useful for the latter, by using appropriate code segments.

Additionally, the following IMHO related/relevant software and papers might also be of interest:

Related Question