#lang scribble/doc @; THIS FILE IS GENERATED @(require scribble/manual) @(require (for-label (planet neil/soundex:1:1))) @title[#:version "0.3"]{@bold{soundex}: Soundex Index Keying in Scheme} @author{Neil Van Dyke} License: @seclink["Legal" #:underline? #f]{LGPL 3} @(hspace 1) Web: @link["http://www.neilvandyke.org/soundex-scheme/" #:underline? #f]{http://www.neilvandyke.org/soundex-scheme/} @defmodule[(planet neil/soundex:1:1)] @section{Introduction} The @bold{soundex} library provides an implementation in Scheme of the Soundex indexing hash function as specified somewhat loosely by US National Archives and Records Administration (NARA) publication [Soundex], and verified empirically against test cases from various sources. Both the current NARA function and the older version with different handling of `H' and `W' are supported. Additionally, a nonstandard prefix-guessing function that is an invention of this lirbray function permits additional Soundex keys to be generated from a string, increasing recall. This library should work under any R5RS Scheme implementation for which @tt{char->integer} yields ASCII values. @itemize[ @item{ [GIL-55] US National Archives and Records Administration, ``Using the Census Soundex,'' General Information Leaflet 55, 1995. } @item{ [Soundex] US National Archives and Records Administration, ``The Soundex Indexing System,'' 2000-02-19. } ] @section{Characters, Ordinals, and Codes} To facilitate possible future support of other input character sets, this library employs a @italic{character ordinal} abstract representation of the letters used by Soundex. The ordinal value is an integer from 0 to 25---corresponding to the 26 letters `A' through `Z', respectively---and can be used for fast mapping via vectors. Most applications need not be aware of this. @defproc[(soundex-ordinal (chr any/c)) any/c]{ Yields the Soundex ordinal value of character @schemevarfont{chr}, of @schemevarfont{#f} if the character is not considered a letter. @SCHEMEBLOCK[ (soundex-ordinal #\a) ==> 0 (soundex-ordinal #\A) ==> 0 (soundex-ordinal #\Z) ==> 25 (soundex-ordinal #\3) ==> #f (soundex-ordinal #\.) ==> #f] } @defproc[(soundex-ordinal->char (ord any/c)) any/c]{ Yields the upper-case letter character that corresponds to the character ordinal value @schemevarfont{ord}. For example: @SCHEMEBLOCK[ (soundex-ordinal->char (soundex-ordinal #\a)) ==> #\A] Note that an @tt{#f} value as a result of applying @tt{soundex-ordinal} is @emph{not} an ordinal value, and is not mapped to a character by @tt{soundex-ordinal->char}. For example: @SCHEMEBLOCK[ (soundex-ordinal->char (soundex-ordinal #\')) error-->] } @defproc[(soundex-ordinal->soundex-code (ord any/c)) any/c]{ Yields a library-specific Soundex code for character ordinal @schemevarfont{ord}. @SCHEMEBLOCK[ (soundex-ordinal->soundex-code (soundex-ordinal #\a)) ==> aeiou (soundex-ordinal->soundex-code (soundex-ordinal #\c)) ==> #\2 (soundex-ordinal->soundex-code (soundex-ordinal #\N)) ==> #\5 (soundex-ordinal->soundex-code (soundex-ordinal #\w)) ==> hw (soundex-ordinal->soundex-code (soundex-ordinal #\y)) ==> y] } @defproc[(char->soundex-code (chr any/c)) any/c]{ Yields a library-specific Soundex code for character @schemevarfont{chr}. This is equivalent to: @tt{(soundex-ordinal->soundex-code (soundex-ordinal @schemevarfont{chr}))}. } @section{Hashing} Soundex hashes of strings can be generated with @tt{soundex-nara}, @tt{soundex-old}, and @tt{soundex}. @defproc[(soundex/narahw/start (str any/c) (narahw? any/c) (start any/c)) any/c]{ This is an internal procedure. @SCHEMEBLOCK[ (soundex/narahw/start "van Dam" #t 4) ==> "D500" (soundex/narahw/start ".0,!" #t 0) ==> #f] } @defproc[(soundex-nara (str any/c)) any/c]{} @defproc[(soundex-old (str any/c)) any/c]{} @defproc[(soundex (str any/c)) any/c]{ Yields a Soundex hash key of string @schemevarfont{str}, or @tt{#f} if not even an initial letter could be found. @tt{soundex-nara} generates NARA hashes, and @tt{soundex-old} generates older-style hashes. @tt{soundex} is an alias for @tt{soundex-nara}. @SCHEMEBLOCK[ (soundex-nara "Ashcraft") ==> "A261" (soundex-old "Ashcraft") ==> "A226" (soundex "Ashcraft") ==> "A261" (soundex "") ==> #f] } @section{Prefixing} Multiple Soundex hashes from a single string can be generated by @tt{soundex-nara/prefixing}, @tt{soundex-old/prefixing}, and @tt{soundex/p}, which consider the string with and without various common surname prefixes. @defproc[(soundex-prefix-starts (str any/c)) any/c]{ Yields a list of Soundex start points in string @schemevarfont{str}, as character index integers, for making hash keys with and without prefixes. A prefix must be followed by at least two letters, although they can be interspersed with non-letter characters. The exact behavior of this function is subject to change in future versions of this library. @SCHEMEBLOCK[ (soundex-prefix-starts "Smith") ==> (0) (soundex-prefix-starts " Jones") ==> (2) (soundex-prefix-starts "vanderlinden") ==> (0 3 6) (soundex-prefix-starts "van der linden") ==> (0 3 7) (soundex-prefix-starts "") ==> () (soundex-prefix-starts "123") ==> () (soundex-prefix-starts "dea") ==> (0) (soundex-prefix-starts "dea ") ==> (0) (soundex-prefix-starts "dean") ==> (0) (soundex-prefix-starts "delasol") ==> (0 2 3 4)] } @defproc[(soundex/narahw (str any/c) (narahw? any/c)) any/c]{ This is an internal procedure. } @defproc[(soundex-nara/prefixing (str any/c)) any/c]{} @defproc[(soundex-old/prefixing (str any/c)) any/c]{} @defproc[(soundex/p (str any/c)) any/c]{ Yields a list of zero or more Soundex hash keys from string @schemevarfont{str}, based on the whole string and the string with various prefixes skipped. All elements of the list are mutually unique. @tt{soundex-nara/prefixing} generates NARA hashes, and @tt{soundex-old/prefixing} generates older-style hashes. @tt{soundex/p} is an alias for @tt{soundex-nara/prefixing}. @SCHEMEBLOCK[ (soundex/p "Van Damme") ==> ("V535" "D500") (soundex/p "vanvoom") ==> ("V515" "V500") (soundex/p "vanvanvan") ==> ("V515") (soundex/p "DeLaSol") ==> ("D424" "L240" "A240" "S400") (soundex/p "") ==> ()] } @section{History} @itemize[ @item{Version 0.3 --- 2009-02-24 -- PLaneT @tt{(1 0)} Licensed under LGPL 3. Converted to author's new Scheme administration system. Made test suite executable. Minor documentation changes. } @item{Version 0.2 --- 2004-08-02 Minor documentation change. Version frozen for PLaneT packaging. } @item{Version 0.1 --- 2004-05-10 First release. } ] @section[#:tag "Legal"]{Legal} Copyright (c) 2004--2009 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License (LGPL 3), or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.