1 Longest Common Subsequence (LCS)
list-lcs
2 Diff
2.1 List Diff
list-diff
2.2 File Diff
file-diff
3 References

Diff Package

by Doug Williams

m.douglas.williams at gmail.com

This package provides a simple diff-like capability in Racket. This includes diffs of arbitrary lists or of text files. It also includes an implementation of the longest common subsequence (LCS) algorithm, which is the basis of the diff algorithm.

The diff package is available from the PLaneT repository.

 (require (planet williams/diff/diff))

    1 Longest Common Subsequence (LCS)

    2 Diff

      2.1 List Diff

      2.2 File Diff

    3 References

1 Longest Common Subsequence (LCS)

The longest common subsequence (LCS) is the longest subsequence that is common to a set of sequences. We are solving the special case with exactly two sequences, which are represented as lists. This is the basis of the diff algorithm.

(list-lcs list-1 list-2 [test])  list?
  list-1 : list?
  list-2 : list?
  test : (-> any/c any/c boolean?) = equal?
Returns the longest common subsequence (LCS) of list-1 and list-2 using test to compare the elements.

Example:

(list-lcs '(a b c d f g h j q z) '(a b c d e f g h i j k r x y z) eq?)
(a b c d f g j z)

2 Diff

The diff algorithm computes the differences between two sequences in the form of what changes are required to the first sequence to produce the second sequence. This is based on the longest common subsequence.

Once the longest common subsequence (LCS) is computed (using list-lcs) it’s easy to produce the diff-like output: if an item is absent in the LCS but present in the first sequence, it must have been deleted; and, it an item is absent in the LCS by present in the second sequence, it must have been added.

2.1 List Diff

(list-diff list-1 list-2 [test])  list?
  list-1 : list?
  list-2 : list?
  test : (-> any/c any/c boolean?) = equal?
Returns a list of the differences between list-1 and list-2 using test to compare the elements. Items common to both lists are included in the results as is. Items that have been added are included as a list whose first element is #:added followed by the added elements. Items that have been removed ae included as a list whose first element is #:removed followed by the deleted elements.

Examples:

(list-diff '(a b c d f g h j q z) '(a b c d e f g h i j k r x y z) eq?)
(a b c d (#:added e) f g (#:removed h) (#:added i) j (#:removed q) (#:added k r x y) z)

2.2 File Diff

A common use of diff is to print the differences between two text files - list the Unix diff command. The files are read as sequences of lines (i.e., string delimited by end-of-line sequences) and list-diff is used to compute the differences between files. The differences are then printed. Lines common to both files are denoted by "=|", lines that have been added are denoted by ">|", and lines that have been removed are denoted by "<|".

(file-diff list-1 list-2)  void?
  list-1 : list?
  list-2 : list?
Prints the differences between file-1 and file-2.

Example:

original.txt

This part of the

document has stayed the

same from version to

version.  It shouldn't

be shown if it doesn't

change.  Otherwise, that

would not be helping to

compress the size of the

changes.

 

This paragraph contains

text that is outdated.

It will be deleted in the

near future.

 

It is important to spell

check this dokument. On

the other hand, a

misspelled word isn't

the end of the world.

Nothing in the rest of

this paragraph needs to

be changed. Things can

be added after it.

new.txt

This is an important

notice! It should

therefore be located at

the beginning of this

document!

 

This part of the

document has stayed the

same from version to

version.  It shouldn't

be shown if it doesn't

change.  Otherwise, that

would not be helping to

compress anything.

 

It is important to spell

check this document. On

the other hand, a

misspelled word isn't

the end of the world.

Nothing in the rest of

this paragraph needs to

be changed. Things can

be added after it.

 

This paragraph contains

important new additions

to this document.

(file-diff "original.txt" "new.txt")

>|This is an important

>|notice! It should

>|therefore be located at

>|the beginning of this

>|document!

>|

=|This part of the

=|document has stayed the

=|same from version to

=|version.  It shouldn't

=|be shown if it doesn't

=|change.  Otherwise, that

=|would not be helping to

<|compress the size of the

<|changes.

>|compress anything.

=|

<|This paragraph contains

<|text that is outdated.

<|It will be deleted in the

<|near future.

<|

=|It is important to spell

<|check this dokument. On

>|check this document. On

=|the other hand, a

=|misspelled word isn't

=|the end of the world.

=|Nothing in the rest of

=|this paragraph needs to

=|be changed. Things can

=|be added after it.

>|

>|This paragraph contains

>|important new additions

>|to this document.

3 References

"Longest common subsequence problem", Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem (Accessed January 13, 2011).

"Diff", Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Diff (Accessed January 13, 2011).

"Algorithm Implementation/Strings/Longest common subsequence", WikiBooks, Open books for an open world, http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Longest_common_subsequence (Accessed January 13, 2011).