Artifact [d0aff8d2b5]

Login

Artifact d0aff8d2b573ab3a80312e45f92dbee7d14725cbb21c1b56dae6b938dd3dcb90:


TIP:            217
Title:          Getting Sorted Indices out of Lsort
Version:        $Revision: 1.10 $
Author:         James P. Salsman <[email protected]>
State:          Draft
Type:           Project
Vote:           Pending
Created:        26-Aug-2004
Post-History:   
Keywords:       Tcl,lsort,parallel lists
Tcl-Version:    8.5

~ Abstract

An '''-indices''' option is proposed for the '''lsort''' command, returning the indices of the given list's elements in the order that they would have otherwise been sorted.

~ Rationale

When corresponding parallel lists must be simultaneously sorted or 
accessed in the order given by sorting them all according to one used 
as a list of keys, it is necessary to obtain the indices of the key list's elements in the order that they would be sorted, without 
actually sorting the list.  For example, a list of first names and a 
corresponding list of last names can be displayed in side-by-side Tk 
listboxes, in which case we may want to sort both lists by either one 
used as the sorting key, or we may want to simultaneously iterate over 
both in either order.  To do so, merely sorting a list is unhelpful; 
we need to obtain the indices of the key list in the order that its 
corresponding elements would be sorted.

Tk listboxes, database I/O, and statistics applications often 
involve heavy use of parallel lists.  For this and other reasons, many
programming languages starting at least as early as APL, up to 
present-day, numerics-oriented languages such as MATLAB, have included 
the ability to directly obtain the indices required to access a list 
(or "vector") in sorted order.  As shown below, the pure Tcl solution 
to this problem can take more than 10 times as long as the given 
reference implementation, which adds virtually no overhead when it is
not invoked.

~ Proposed Specification

The '''lsort''' command shall accept a new option, '''-indices'''.
When '''lsort''' is invoked with this option, it shall return a list 
of integer indices of the elements of the list given as the final
argument to '''lsort''', in the order that the elements would have
been sorted had the '''-indices''' option not been specified.

This means an alternative (though less efficient for single lists) mechanism for producing a sorted list could be:

|set resultList {}
|foreach idx [lsort -indices $sourceList] {
|    lappend resultList [lindex $sourceList $idx]
|}

~ Reference Implementation

The reference implementation is available on SourceForge [http://sourceforge.net/tracker/index.php?func=detail&aid=1017532&group_id=10894&atid=310894] 
and at: http://www.bovik.org/lsort-indices-diff.txt  It should be 
applied with '''patch -l''' or '''patch --ignore-whitespace''' or it 
may not fully apply.

That reference implementation is a 109-line context diff, involving 
adding 20 lines of code to ''tclCmdIL.c'', only one additional int 
of data memory overhead, and just one additional integer comparison 
at run time if the new option is not invoked.

Compared to the following pure Tcl implementation, the reference 
implementation takes 15% of the execution time for a list of 50,000 
random real numbers, and just 9% of the execution time for a list 
of 5,000 random reals.  

This pure Tcl implementation was adapted by Richard Suchenwirth from 
an earlier version by the author:

| proc lsort-indices list {
|   if [llength $list] {
|     set i -1
|     foreach e $list {lappend tmp [list [incr i] $e]}
|     foreach e [lsort -index 1 -real $tmp] {lappend res [lindex $e 0]}
|     set res
|   }
| }

~ Proposed Documentation

In the '''lsort''' man page, under '''DESCRIPTION''', change the 
first sentence:

 > "This command sorts the elements of list, returning a new list in
   sorted order."

... to read:

 > "This command sorts the elements of list, and returns a new list in
   sorted order, unless the -indices option is specified, in which
   case a list of integers is returned, corresponding to the indices
   of the given list's elements in the order that they otherwise would
   have been sorted."

Under '''EXAMPLES''', at the end of the section, include the following
lines:

| Obtaining ordered indices:
|
|  % lsort -indices [list a c b]
|  0 2 1
|  % lsort -indices -unique -decreasing -real -index 0 \
|          {{1.2 a} {34.5 b} {34.5 c} {5.6 d}}
|  2 3 0

~ Tcl-core Discussion

Here are some highlights from the discussion of this TIP on the 
Tcl-core mailing list.  No assurance is given that the discussion
is either completely or impartially represented here.

Lars Hellstr�m 
[http://sourceforge.net/mailarchive/message.php?msg_id=9346824] 
described a pure Tcl solution virtually identical to the one shown 
above, "which could be complicated enough to warrent a special [lsort] 
option."  He also suggested a '''-keycommand''' option for sorting on 
keys generated on-the-fly.  Finally, he pointed out a flaw concerning
the example in the Rationale from the original version of this TIP, 
which has since been corrected.

In reply to Lars, the author
[http://sourceforge.net/mailarchive/message.php?msg_id=9348381]
provided the timing data given above, and an efficient alternative to 
the '''-keycommand''' idea using this TIP's '''-indices''' proposal.

~ Copyright

This document has been placed in the public domain by the author.