#lang scribble/doc @; THIS FILE IS GENERATED @(require scribble/manual) @(require (for-label (planet neil/levenshtein:1:2))) @title[#:version "0.5"]{@bold{levenshtein}: Levenshtein Distance Metric in Scheme} @author{Neil Van Dyke} License: @seclink["Legal" #:underline? #f]{LGPL 3} @(hspace 1) Web: @link["http://www.neilvandyke.org/levenshtein-scheme/" #:underline? #f]{http://www.neilvandyke.org/levenshtein-scheme/} @defmodule[(planet neil/levenshtein:1:2)] @section{Introduction} This is a Scheme implementation of the @italic{Levenshtein Distance} algorithm, which is an @italic{edit distance} metric of string similarity, due to Vladimir Levenshtein. The Levenshtein Distance is a function of two strings that represents a count of single-character insertions, deletions, and substitions that will change the first string to the second. More information is available in @link["http://www.nist.gov/dads/HTML/Levenshtein.html"]{NIST DADS} and the Michael Gilleland article, ``@link["http://www.merriampark.com/ld.htm"]{Levenshtein Distance in Three Flavors}.'' This implementation is modeled after a @link["http://www.mgilleland.com/ld/ldperl2.htm"]{space-efficient Perl implementation} by Jorge Mas Trullenque. It has been written in R5RS Scheme (with an @tt{error} procedure such as the one in SRFI-23), and extended to support heterogeneous combinations of Scheme types (strings, lists, vectors), user-supplied predicate functions, and optionally reusable scratch vectors. @section{Basic Comparisons} In the current implementation, all comparisons are done internally via vectors. @defproc[(vector-levenshtein/predicate/get-scratch (a any/c) (b any/c) (pred any/c) (get-scratch any/c)) any/c]{ Few, if any, programs will use this procedure directly. This is like @tt{vector-levenshtein/predicate}, but allows @schemevarfont{get-scratch} to be specified. @schemevarfont{get-scratch} is a procedure of one term, @emph{n}, that yields a vector of length @emph{n} or greater, which is used for record-keeping during execution of the Levenshtein algorithm. @tt{make-vector} can be used for @schemevarfont{get-scratch}, although some programs comparing a large size or quantity of vectors may wish to reuse a record-keeping vector, rather than each time allocating a new one that will need to be garbage-collected. } @defproc[(vector-levenshtein/predicate (a any/c) (b any/c) (pred any/c)) any/c]{} @defproc[(vector-levenshtein/eq (a any/c) (b any/c)) any/c]{} @defproc[(vector-levenshtein/eqv (a any/c) (b any/c)) any/c]{} @defproc[(vector-levenshtein/equal (a any/c) (b any/c)) any/c]{} @defproc[(vector-levenshtein (a any/c) (b any/c)) any/c]{ Calculate the Levenshtein Distance of vectors @schemevarfont{a} and @schemevarfont{b}. @schemevarfont{pred} is the predicate procedure for determining if two elements are equal. The @tt{/eq}, @tt{/eqv}, and @tt{/equal} variants correspond to the standard equivalence predicates, @tt{eq?}, @tt{eqv?}, and @tt{equal?}. @tt{vector-levenshtein} is an alias for @tt{vector-levenshtein/equal}. @SCHEMEBLOCK[ (vector-levenshtein '#(6 6 6) '#(6 35 6 24 6 32)) ==> 3 ] } @defproc[(list-levenshtein/predicate (a any/c) (b any/c) (pred any/c)) any/c]{} @defproc[(list-levenshtein/eq (a any/c) (b any/c)) any/c]{} @defproc[(list-levenshtein/eqv (a any/c) (b any/c)) any/c]{} @defproc[(list-levenshtein/equal (a any/c) (b any/c)) any/c]{} @defproc[(list-levenshtein (a any/c) (b any/c)) any/c]{ Calculate the Levenshtein Distance of lists @schemevarfont{a} and @schemevarfont{b}. @schemevarfont{pred} is the predicate procedure for determining if two elements are equal. The @tt{/eq}, @tt{/eqv}, and @tt{/equal} variants correspond to the standard equivalence predicates, @tt{eq?}, @tt{eqv?}, and @tt{equal?}. @tt{list-levenshtein} is an alias for @tt{list-levenshtein/equal}. Note that comparison of lists is less efficient than comparison of vectors. @SCHEMEBLOCK[ (list-levenshtein/eq '(b c e x f y) '(a b c d e f)) ==> 4 ] } @defproc[(string-levenshtein (a any/c) (b any/c)) any/c]{ Calculate the Levenshtein Distance of strings @schemevarfont{a} and @schemevarfont{b}. @SCHEMEBLOCK[ (string-levenshtein "adresse" "address") ==> 2 ] } @section{Type-Coercing Comparisons} Procedures @tt{levenshtein} and @tt{levenshtein/predicate} provide a convenient interface for comparing a combination of vectors, lists, and strings, the types of which might not be known until runtime. @defproc[(levenshtein/predicate (a any/c) (b any/c) (pred any/c)) any/c]{ Calculates the Levenshtein Distance of two objects @schemevarfont{a} and @schemevarfont{b}, which are vectors, lists, or strings. @schemevarfont{a} and @schemevarfont{b} need not be of the same type. @schemevarfont{pred} is the element equivalence predicate used. @SCHEMEBLOCK[ (levenshtein/predicate '#(#\A #\B #\C #\D) "aBXcD" char-ci=?) ==> 1 ] } @defproc[(levenshtein (a any/c) (b any/c)) any/c]{ Calculate the levenshtein distance of @schemevarfont{a} and @schemevarfont{b}, in a similar manner as using @tt{levenshtein/predicate} with @tt{equal?} as the predicate. @SCHEMEBLOCK[ (define g '#(#\g #\u #\m #\b #\o)) (levenshtein g "gambol") ==> 2 (levenshtein g "dumbo") ==> 1 (levenshtein g "umbrage") ==> 5 ] } @section{History} @itemize[ @item{Version 0.5 --- !!! -- PLaneT @tt{(1 2)} License is now LGPL 3. Tests moved out of main file. Converted to author's new Scheme administration system. } @item{Version 0.4 --- 2005-07-10 -- PLaneT @tt{(1 1)} Added Testeez tests. } @item{Version 0.3 --- 2005-07-09 -- PLaneT @tt{(1 0)} PLaneT release, and minor documentation changes. } @item{Version 0.2 --- 2004-07-06 Documentation changes. } @item{Version 0.1 --- 2004-05-13 First release. Tested only lightly, and today @emph{is} the 13th, so @emph{caveat emptor}. } ] @section[#:tag "Legal"]{Legal} Copyright (c) 2004--2009 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License (LGPL 3), or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.