#lang scribble/doc @(require scribble/manual scribble/struct scribble/eval (planet williams/diff/diff) (for-label racket racket/date (planet williams/diff/diff))) @title[#:tag "diff"]{Diff Package} by Doug Williams @tt{m.douglas.williams at gmail.com} This package provides a simple diff-like capability in Racket. This includes diffs of arbitrary lists or of text files. It also includes an implementation of the longest common subsequence (LCS) algorithm, which is the basis of the diff algorithm. The diff package is available from the PLaneT repository. @defmodule[(planet williams/diff/diff)] @table-of-contents[] @section[#:tag "LCS"]{Longest Common Subsequence (LCS)} The longest common subsequence (LCS) is the longest subsequence that is common to a set of sequences. We are solving the special case with exactly two sequences, which are represented as lists. This is the basis of the diff algorithm. @defproc[(list-lcs (list-1 list?) (list-2 list?) (test (-> any/c any/c boolean?) equal?)) list?]{ Returns the longest common subsequence (LCS) of @racket[list-1] and @racket[list-2] using @racket[test] to compare the elements.} Example: @racket[(list-lcs '(a b c d f g h j q z) '(a b c d e f g h i j k r x y z) eq?)] @linebreak[] @racketfont{(a b c d f g j z)} @section[#:tag "Diff"]{Diff} The diff algorithm computes the differences between two sequences in the form of what changes are required to the first sequence to produce the second sequence. This is based on the longest common subsequence. Once the longest common subsequence (LCS) is computed (using @racket[list-lcs]) it's easy to produce the diff-like output: if an item is absent in the LCS but present in the first sequence, it must have been deleted; and, it an item is absent in the LCS by present in the second sequence, it must have been added. @subsection[#:tag "list-diff"]{List Diff} @defproc[(list-diff (list-1 list?) (list-2 list?) (test (-> any/c any/c boolean?) equal?)) list?]{ Returns a list of the differences between @racket[list-1] and @racket[list-2] using @racket[test] to compare the elements. Items common to both lists are included in the results as is. Items that have been added are included as a list whose first element is @racket[#:added] followed by the added elements. Items that have been removed ae included as a list whose first element is @racket[#:removed] followed by the deleted elements.} Examples: @racket[(list-diff '(a b c d f g h j q z) '(a b c d e f g h i j k r x y z) eq?)] @linebreak[] @racketfont{(a b c d (#:added e) f g (#:removed h) (#:added i) j (#:removed q) (#:added k r x y) z)} @subsection[#:tag "file-diff"]{File Diff} A common use of diff is to print the differences between two text files - list the Unix @tt{diff} command. The files are read as sequences of lines (i.e., string delimited by end-of-line sequences) and list-diff is used to compute the differences between files. The differences are then printed. Lines common to both files are denoted by @racketfont{"=|"}, lines that have been added are denoted by @racketfont{">|"}, and lines that have been removed are denoted by @racketfont{"<|"}. @defproc[(file-diff (list-1 list?) (list-2 list?)) void?]{ Prints the differences between @racket[file-1] and @racket[file-2].} Example: @italic{original.txt} @verbatim{ This part of the document has stayed the same from version to version. It shouldn't be shown if it doesn't change. Otherwise, that would not be helping to compress the size of the changes. This paragraph contains text that is outdated. It will be deleted in the near future. It is important to spell check this dokument. On the other hand, a misspelled word isn't the end of the world. Nothing in the rest of this paragraph needs to be changed. Things can be added after it. } @italic{new.txt} @verbatim{ This is an important notice! It should therefore be located at the beginning of this document! This part of the document has stayed the same from version to version. It shouldn't be shown if it doesn't change. Otherwise, that would not be helping to compress anything. It is important to spell check this document. On the other hand, a misspelled word isn't the end of the world. Nothing in the rest of this paragraph needs to be changed. Things can be added after it. This paragraph contains important new additions to this document. } @racket[(file-diff "original.txt" "new.txt")] @linebreak[] @verbatim{ >|This is an important >|notice! It should >|therefore be located at >|the beginning of this >|document! >| =|This part of the =|document has stayed the =|same from version to =|version. It shouldn't =|be shown if it doesn't =|change. Otherwise, that =|would not be helping to <|compress the size of the <|changes. >|compress anything. =| <|This paragraph contains <|text that is outdated. <|It will be deleted in the <|near future. <| =|It is important to spell <|check this dokument. On >|check this document. On =|the other hand, a =|misspelled word isn't =|the end of the world. =|Nothing in the rest of =|this paragraph needs to =|be changed. Things can =|be added after it. >| >|This paragraph contains >|important new additions >|to this document. } @section[#:tag "references"]{References} "Longest common subsequence problem", @italic{Wikipedia, The Free Encyclopedia}, @tt{http://en.wikipedia.org/wiki/Longest_common_subsequence_problem} (Accessed January 13, 2011). "Diff", @italic{Wikipedia, The Free Encyclopedia}, @tt{http://en.wikipedia.org/wiki/Diff} (Accessed January 13, 2011). "Algorithm Implementation/Strings/Longest common subsequence", @italic{WikiBooks, Open books for an open world}, @tt{http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Longest_common_subsequence} (Accessed January 13, 2011).