## Diff Package

by Doug Williams

m.douglas.williams at gmail.com

This package provides a simple diff-like capability in Racket. This includes diffs of arbitrary lists or of text files. It also includes an implementation of the longest common subsequence (LCS) algorithm, which is the basis of the diff algorithm.

The diff package is available from the PLaneT repository.

 (require (planet williams/diff/diff))

### 1Longest Common Subsequence (LCS)

The longest common subsequence (LCS) is the longest subsequence that is common to a set of sequences. We are solving the special case with exactly two sequences, which are represented as lists. This is the basis of the diff algorithm.

 (list-lcs list-1 list-2 [test]) → list? list-1 : list? list-2 : list? test : (-> any/c any/c boolean?) = equal?
Returns the longest common subsequence (LCS) of list-1 and list-2 using test to compare the elements.

Example:

(list-lcs '(a b c d f g h j q z) '(a b c d e f g h i j k r x y z) eq?)
(a b c d f g j z)

### 2Diff

The diff algorithm computes the differences between two sequences in the form of what changes are required to the first sequence to produce the second sequence. This is based on the longest common subsequence.

Once the longest common subsequence (LCS) is computed (using list-lcs) it’s easy to produce the diff-like output: if an item is absent in the LCS but present in the first sequence, it must have been deleted; and, it an item is absent in the LCS by present in the second sequence, it must have been added.

#### 2.1List Diff

 (list-diff list-1 list-2 [test]) → list? list-1 : list? list-2 : list? test : (-> any/c any/c boolean?) = equal?
Returns a list of the differences between list-1 and list-2 using test to compare the elements. Items common to both lists are included in the results as is. Items that have been added are included as a list whose first element is #:added followed by the added elements. Items that have been removed ae included as a list whose first element is #:removed followed by the deleted elements.

Examples:

(list-diff '(a b c d f g h j q z) '(a b c d e f g h i j k r x y z) eq?)
(a b c d (#:added e) f g (#:removed h) (#:added i) j (#:removed q) (#:added k r x y) z)

#### 2.2File Diff

A common use of diff is to print the differences between two text files - list the Unix diff command. The files are read as sequences of lines (i.e., string delimited by end-of-line sequences) and list-diff is used to compute the differences between files. The differences are then printed. Lines common to both files are denoted by "=|", lines that have been added are denoted by ">|", and lines that have been removed are denoted by "<|".

 (file-diff list-1 list-2) → void? list-1 : list? list-2 : list?
Prints the differences between file-1 and file-2.

Example:

original.txt
 This part of the document has stayed the same from version to version.  It shouldn't be shown if it doesn't change.  Otherwise, that would not be helping to compress the size of the changes. This paragraph contains text that is outdated. It will be deleted in the near future. It is important to spell check this dokument. On the other hand, a misspelled word isn't the end of the world. Nothing in the rest of this paragraph needs to be changed. Things can be added after it.

new.txt
 This is an important notice! It should therefore be located at the beginning of this document! This part of the document has stayed the same from version to version.  It shouldn't be shown if it doesn't change.  Otherwise, that would not be helping to compress anything. It is important to spell check this document. On the other hand, a misspelled word isn't the end of the world. Nothing in the rest of this paragraph needs to be changed. Things can be added after it. This paragraph contains important new additions to this document.

(file-diff "original.txt" "new.txt")
 >|This is an important >|notice! It should >|therefore be located at >|the beginning of this >|document! >| =|This part of the =|document has stayed the =|same from version to =|version.  It shouldn't =|be shown if it doesn't =|change.  Otherwise, that =|would not be helping to <|compress the size of the <|changes. >|compress anything. =| <|This paragraph contains <|text that is outdated. <|It will be deleted in the <|near future. <| =|It is important to spell <|check this dokument. On >|check this document. On =|the other hand, a =|misspelled word isn't =|the end of the world. =|Nothing in the rest of =|this paragraph needs to =|be changed. Things can =|be added after it. >| >|This paragraph contains >|important new additions >|to this document.

### 3References

"Longest common subsequence problem", Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem (Accessed January 13, 2011).

"Diff", Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Diff (Accessed January 13, 2011).

"Algorithm Implementation/Strings/Longest common subsequence", WikiBooks, Open books for an open world, http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Longest_common_subsequence (Accessed January 13, 2011).