Version **0.4**, 2005-07-10, http://www.neilvandyke.org/levenshtein-scm/

by
Neil W. Van Dyke
<`neil@neilvandyke.org`

>

```
Copyright © 2004 - 2005 Neil W. Van Dyke. This program is Free
Software; you can redistribute it and/or modify it under the terms of the
GNU Lesser General Public License as published by the Free Software
Foundation; either version 2.1 of the License, or (at your option) any
later version. This program is distributed in the hope that it will be
useful, but without any warranty; without even the implied warranty of
merchantability or fitness for a particular purpose. See
<
````http://www.gnu.org/copyleft/lesser.html`

> for details. For
other license options and consulting, contact the author.

This is a Scheme implementation of the Levenshtein Distance algorithm, which is an edit distance metric of string similarity, due to Vladimir Levenshtein. The Levenshtein Distance is a function of two strings that represents a count of single-character insertions, deletions, and substitions that will change the first string to the second. More information is available in NIST DADS and the Michael Gilleland article, “Levenshtein Distance in Three Flavors.”

This implementation is modeled after a
space-efficient Perl implementation by Jorge Mas Trullenque. It has been written in R5RS
Scheme (with an `error`

procedure such as the one in SRFI-23), and
extended to support heterogeneous combinations of Scheme types (strings,
lists, vectors), user-supplied predicate functions, and optionally reusable
scratch vectors.

In the current implementation, all comparisons are done internally via vectors.

— Procedure: **vector-levenshtein/predicate/get-scratch**` a b pred get-scratch`

Few, if any, programs will use this procedure directly. This is like

`vector-levenshtein/predicate`

, but allowsget-scratchto be specified.get-scratchis a procedure of one term,n, that yields a vector of lengthnor greater, which is used for record-keeping during execution of the Levenshtein algorithm.`make-vector`

can be used forget-scratch, although some programs comparing a large size or quantity of vectors may wish to reuse a record-keeping vector, rather than each time allocating a new one that will need to be garbage-collected.

— Procedure: **vector-levenshtein/predicate**` a b pred`

— Procedure:**vector-levenshtein/eq**` a b`

— Procedure:**vector-levenshtein/eqv**` a b`

— Procedure:**vector-levenshtein/equal**` a b`

— Procedure:**vector-levenshtein**` a b`

— Procedure:

— Procedure:

— Procedure:

— Procedure:

Calculate the Levenshtein Distance of vectors

aandb.predis the predicate procedure for determining if two elements are equal. The`/eq`

,`/eqv`

, and`/equal`

variants correspond to the standard equivalence predicates,`eq?`

,`eqv?`

, and`equal?`

.`vector-levenshtein`

is an alias for`vector-levenshtein/equal`

.(vector-levenshtein '#(6 6 6) '#(6 35 6 24 6 32)) => 3

— Procedure: **list-levenshtein/predicate**` a b pred`

— Procedure:**list-levenshtein/eq**` a b`

— Procedure:**list-levenshtein/eqv**` a b`

— Procedure:**list-levenshtein/equal**` a b`

— Procedure:**list-levenshtein**` a b`

— Procedure:

— Procedure:

— Procedure:

— Procedure:

Calculate the Levenshtein Distance of lists

aandb.predis the predicate procedure for determining if two elements are equal. The`/eq`

,`/eqv`

, and`/equal`

variants correspond to the standard equivalence predicates,`eq?`

,`eqv?`

, and`equal?`

.`list-levenshtein`

is an alias for`list-levenshtein/equal`

. Note that comparison of lists is less efficient than comparison of vectors.(list-levenshtein/eq '(b c e x f y) '(a b c d e f)) => 4

— Procedure: **string-levenshtein**` a b`

Calculate the Levenshtein Distance of strings

aandb.(string-levenshtein "adresse" "address") => 2

Procedures `levenshtein`

and `levenshtein/predicate`

provide a
convenient interface for comparing a combination of vectors, lists, and
strings, the types of which might not be known until runtime.

— Procedure: **levenshtein/predicate**` a b pred`

Calculates the Levenshtein Distance of two objects

aandb, which are vectors, lists, or strings.aandbneed not be of the same type.predis the element equivalence predicate used.(levenshtein/predicate '#(#\A #\B #\C #\D) "aBXcD" char-ci=?) => 1

— Procedure: **levenshtein**` a b`

Calculate the levenshtein distance of

aandb, in a similar manner as using`levenshtein/predicate`

with`equal?`

as the predicate.(define g '#(#\g #\u #\m #\b #\o)) (levenshtein g "gambol") => 2 (levenshtein g "dumbo") => 1 (levenshtein g "umbrage") => 5

The `levenshtein.scm`

test suite can be enabled by editing the source
code file and loading Testeez.

- Version 0.4 — 2005-07-10
- Added Testeez tests.
- Version 0.3 — 2005-07-09
- PLaneT release, and minor documentation changes.
- Version 0.2 — 2004-07-06
- Documentation changes.
- Version 0.1 — 2004-05-13
- First release. Tested only lightly, and today
*is*the 13th, so*caveat emptor*.