[comment {-*- tcl -*- doctools manpage}] [comment {$Id: $}] [manpage_begin list n 1.2.2] [copyright {2003 by Kevin B. Kenny. All rights reserved}] [moddesc {Tcl Data Structures}] [titledesc {Procedures for manipulating lists}] [require Tcl 8.0] [require struct [opt 1.2.1]] [description] [para] The [cmd ::struct::list] namespace contains several useful commands for processing Tcl lists. Generally speaking, they implement algorithms more complex or specialized than the ones provided by Tcl itself. [section COMMANDS] [list_begin definitions] [call [cmd ::struct::list::longestCommonSubsequence] \ [arg {sequence1 sequence2 ?maxOccurs?}]] Returns the longest common subsequence of [arg sequence1] and [arg sequence2]. If the [arg maxOccurs] parameter is provided, the common subsequence is restricted to elements that occur no more than [arg maxOccurs] times in [arg sequence2]. [nl] The return value is a list of two lists of equal length. The first sublist is of indices into [arg sequence1], and the second sublist is of indices into [arg sequence2]. Each corresponding pair of indices corresponds to equal elements in the sequences; the sequence returned is the longest possible. [call [cmd ::struct::list::longestCommonSubsequence2] \ [arg {sequence1 sequence2 ?maxOccurs?}]] Returns an approximation to the longest common sequence of [arg sequence1] and [arg sequence2]. If the [arg maxOccurs] parameter is omitted, the subsequence computed is exactly the longest common subsequence; otherwise, the longest common subsequence is approximated by first determining the longest common sequence of only those elements that occur no more than [arg maxOccurs] times in [arg sequence2], and then using that result to align the two lists, determining the longest common subsequences of the sublists between the two elements. [nl] As with [cmd ::struct::list::longestCommonSubsequence], The return value is a list of two lists of equal length. The first sublist is of indices into [arg sequence1], and the second sublist is of indices into [arg sequence2]. Each corresponding pair of indices corresponds to equal elements in the sequences. The sequence approximates the longest common subsequence. [list_end] [section {LONGEST COMMON SUBSEQUENCE AND FILE COMPARISON}] [para] The [cmd ::struct::list::longestCommonSubsequence] command forms the core of a flexible system for doing differential comparisons of files, similar to the capability offered by the Unix command [syscmd diff]. While this procedure is quite rapid for many tasks of file comparison, its performance degrades severely if [arg sequence2] contains many equal elements (as, for instance, when using this procedure to compare two files, a quarter of whose lines are blank. This drawback is intrinsic to the algorithm used (see the Reference for details). [para] One approach to dealing with the performance problem that is sometimes effective in practice is arbitrarily to exclude elements that appear more than a certain number of times. This number is provided as the [arg maxOccurs] parameter. If frequent lines are excluded in this manner, they will not appear in the common subsequence that is computed; the result will be the longest common subsequence of infrequent elements. The procedure [cmd ::struct::list::longestCommonSubsequence2] implements this heuristic. It functions as a wrapper around [cmd ::struct::list::longestCommonSubsequence]; it computes the longest common subsequence of infrequent elements, and then subdivides the subsequences that lie between the matches to approximate the true longest common subsequence. [section REFERENCES] J. W. Hunt and M. D. McIlroy, "An algorithm for differential file comparison," Comp. Sci. Tech. Rep. #41, Bell Telephone Laboratories (1976). Available on the Web at the second author's personal site: [uri http://www.cs.dartmouth.edu/~doug/] [keywords list diff differential comparison common subsequence] [manpage_end]