With the increasing amount of unstructured data distance measures get more and more important. I am looking for an online course or any other material which gives me a profound inside in the following areas
Distances based on edits
- Damerau-Levenshtein
- Hamming
- Levenshtein
- Optimal sting alignment
Distances based on qgrams
- qgram
- cosine
- jaccard distance
Distances based on or heuristic metrics
- Jaro
- Jaro-Winkler
Distances based on quantitative computations
- Geometric
- Manhattan
As well as the theoretical concepts of distance measures and distance functions and applications such as
- String comparison
- Fuzzy matching
- Clustering
Edit: Any implementations in R
such as the agrep
/ agrepl
function and the stringdist
package are also welcome.
Best Answer
Honestly I do not think that such narrowly focused course exists anywhere (online or offline), but there is an Encyclopedia of Distances book by Deza and Deza (2009, Springer) that you could check.