Version: 0.5

## levenshtein: Levenshtein Distance Metric in Scheme

 (require (planet neil/levenshtein:1:2))

### 1Introduction

This is a Scheme implementation of the Levenshtein Distance algorithm, which is an edit distance metric of string similarity, due to Vladimir Levenshtein. The Levenshtein Distance is a function of two strings that represents a count of single-character insertions, deletions, and substitions that will change the first string to the second. More information is available in NIST DADS and the Michael Gilleland article, “Levenshtein Distance in Three Flavors.”

This implementation is modeled after a space-efficient Perl implementation by Jorge Mas Trullenque. It has been written in R5RS Scheme (with an error procedure such as the one in SRFI-23), and extended to support heterogeneous combinations of Scheme types (strings, lists, vectors), user-supplied predicate functions, and optionally reusable scratch vectors.

### 2Basic Comparisons

In the current implementation, all comparisons are done internally via vectors.

 (vector-levenshtein/predicate/get-scratch a b pred get-scratch) → any/c
a : any/c
b : any/c
pred : any/c
get-scratch : any/c

Few, if any, programs will use this procedure directly. This is like vector-levenshtein/predicate, but allows get-scratch to be specified. get-scratch is a procedure of one term, n, that yields a vector of length n or greater, which is used for record-keeping during execution of the Levenshtein algorithm. make-vector can be used for get-scratch, although some programs comparing a large size or quantity of vectors may wish to reuse a record-keeping vector, rather than each time allocating a new one that will need to be garbage-collected.

 (vector-levenshtein/predicate a b pred) → any/c a : any/c b : any/c pred : any/c

 (vector-levenshtein/eq a b) → any/c a : any/c b : any/c

 (vector-levenshtein/eqv a b) → any/c a : any/c b : any/c

 (vector-levenshtein/equal a b) → any/c a : any/c b : any/c

 (vector-levenshtein a b) → any/c a : any/c b : any/c

Calculate the Levenshtein Distance of vectors a and b. pred is the predicate procedure for determining if two elements are equal. The /eq, /eqv, and /equal variants correspond to the standard equivalence predicates, eq?, eqv?, and equal?. vector-levenshtein is an alias for vector-levenshtein/equal.

(vector-levenshtein '#(6 6 6) '#(6 35 6 24 6 32)) ==> 3

 (list-levenshtein/predicate a b pred) → any/c a : any/c b : any/c pred : any/c

 (list-levenshtein/eq a b) → any/c a : any/c b : any/c

 (list-levenshtein/eqv a b) → any/c a : any/c b : any/c

 (list-levenshtein/equal a b) → any/c a : any/c b : any/c

 (list-levenshtein a b) → any/c a : any/c b : any/c

Calculate the Levenshtein Distance of lists a and b. pred is the predicate procedure for determining if two elements are equal. The /eq, /eqv, and /equal variants correspond to the standard equivalence predicates, eq?, eqv?, and equal?. list-levenshtein is an alias for list-levenshtein/equal. Note that comparison of lists is less efficient than comparison of vectors.

(list-levenshtein/eq '(b c e x f y) '(a b c d e f)) ==> 4

 (string-levenshtein a b) → any/c a : any/c b : any/c

Calculate the Levenshtein Distance of strings a and b.

### 3Type-Coercing Comparisons

Procedures levenshtein and levenshtein/predicate provide a convenient interface for comparing a combination of vectors, lists, and strings, the types of which might not be known until runtime.

 (levenshtein/predicate a b pred) → any/c a : any/c b : any/c pred : any/c

Calculates the Levenshtein Distance of two objects a and b, which are vectors, lists, or strings. a and b need not be of the same type. pred is the element equivalence predicate used.

 (levenshtein/predicate '#(#\A #\B #\C #\D) "aBXcD" char-ci=?) ==> 1

 (levenshtein a b) → any/c a : any/c b : any/c

Calculate the levenshtein distance of a and b, in a similar manner as using levenshtein/predicate with equal? as the predicate.

 (define g '#(#\g #\u #\m #\b #\o)) (levenshtein g "gambol")  ==> 2 (levenshtein g "dumbo")   ==> 1 (levenshtein g "umbrage") ==> 5

### 4History

• Version 0.5 – !!! – PLaneT (1 2)

License is now LGPL 3. Tests moved out of main file. Converted to author’s new Scheme administration system.

• Version 0.4 – 2005-07-10 – PLaneT (1 1)

• Version 0.3 – 2005-07-09 – PLaneT (1 0)

PLaneT release, and minor documentation changes.

• Version 0.2 – 2004-07-06

Documentation changes.

• Version 0.1 – 2004-05-13

First release. Tested only lightly, and today is the 13th, so caveat emptor.