The stringdist Package for Approximate String Matching
Mark P.J. van der Loo
, The R Journal (2014) 6:1, pages 111-122.
Abstract Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R’s native exact matching functions match and %in%.
Received: 2013-11-04; online 2014-04-27
@article{RJ-2014-011,
author = {Mark P.J. van der Loo},
title = {{The stringdist Package for Approximate String Matching}},
year = {2014},
journal = {{The R Journal}},
doi = {10.32614/RJ-2014-011},
url = {https://doi.org/10.32614/RJ-2014-011},
pages = {111--122},
volume = {6},
number = {1}
}