Tcl Source Code

View Ticket
Login
Ticket UUID: 1052584
Title: TIP#225: O(1) space [range] command.
Type: Patch Version: TIP Implementation
Submitter: antirez Created on: 2004-10-23 01:22:33
Subsystem: 14. List Object Assigned To: ferrieux
Priority: 5 Medium Severity:
Status: Open Last Modified: 2008-07-30 17:16:20
Resolution: None Closed By:
    Closed on:
Description:
That's a first version of the [range] command
working in O(1) space. There is some minor bug I'll fix
tomorrow, but the basic stuff appears to be ok.

commands aware of the new arithSeries object
in the TEBC and normal fashion:

llength
lindex
foreach

New commands:

Only [range]. The change is minimal from
the user point of view. From the core
side is a bit more tricky because of the O(1)
space aims.

Other stuff optimized for this object:

SetListObjFromAny will directly convert ranges
into lists without to pass for the string repr.

Open issues:

The code needs a check in order to
don't allow the generation of ranges
that are longer than the max listObj length
supported by the current Tcl code.

For the rest, I used Wide Ints in order to
ensure that ranges shorter than the max
list length, but having as elements Wide ints
will work.

Performance changes:

The effects on the performances of the list-oriented
should be very small, there is some test more
than before. 

The case of:

foreach x [range 0 $n]

is faster than the
equivalent [for] form by 10% to 30% from
a first test.

The appended patch should apply against HEAD.
User Comments: ferrieux added on 2008-07-30 17:16:20:
Logged In: YES 
user_id=496139
Originator: NO

Hey thanks Miguel for assigning this to me :-)
A nearly 4-year-old TIP... Salvatore, and Miguel are you really still backing it ?
The reason I'm asking this, is that:
(1) All performance measurements should at least be updated to current Tcl
(2) The List object type is so ubiquitous in the core that any extension of it should be carefully weighed against the increased complexity of tidying up its API in the future. I'm making this remark in the light of my recent attempt at Mutability and also my Cons proposal.
(3) What exactly is the aim: just beat [for] by 30%, or open the way to "lazy lists", or generators seamlessly integrated with lists ? In other words, instead of hard-wiring just arithmetic series, shouldn't we rather work a little bit more to make Tcl_ListObjType polymorphic once for all ? Such a generic [lindex/lllength] hook in the core would instantly bring [range] and [cons] as extensions...

antirez added on 2004-11-02 08:00:15:

File Added - 107307: range.n

antirez added on 2004-11-02 07:59:40:
Logged In: YES 
user_id=67765

Now there is a man page for the command that is
attached here. It's possible to view it via web at
http://www.invece.org/rangeman.html.
Many thanks to Andreas Kupries that found a number of
bugs in the man page (now should be ok).

antirez added on 2004-10-26 05:00:13:

File Added - 106451: range6.patch

antirez added on 2004-10-26 05:00:12:
Logged In: YES 
user_id=67765

new patch that imports from the Miguel's patch the
idea to avoid wide math when possible. The arithSeries
object have start/end/step fields as unions now, the check
to see if the range stores an int range or a wide one was
already there to return the right type of Tcl object so this
is always a win. Actually I benchmarked this patch that
shows a non-trivial performance gain. There are not semantical
changes in this patch, should behave the same as patch5.

antirez added on 2004-10-25 15:09:06:

File Added - 106350: range.test.tcl

Logged In: YES 
user_id=67765

[range] test attached.

antirez added on 2004-10-25 15:08:33:
Logged In: YES 
user_id=67765

[range] test attached.

antirez added on 2004-10-25 15:08:30:
Logged In: YES 
user_id=67765

[range] test attached.

antirez added on 2004-10-25 15:07:59:

File Added - 106349: range5.patch

Logged In: YES 
user_id=67765

New patch attached (range5.patch). The idea is to merge what I
think are the best points of Miguel's and my patch,
there are also random bugfixes.

Description of the current structure:

- The arithSeries object is a standalone object, there
is no code about it in the List object metod, but
setListFromAny knows how to convert an airthmetic 
sequence to a list without to pass for the string repr.

- The commands lindex, llength and foreach knows about
the arithmetic series object in order to avoid to convert
into a list.

- Like in the Miguel's patch, there is no longer a reference
to a wide object inside the range, this was possible modifing
the code in TEBC using the Miguel's implementation idea.

- Like in Miguel's patch now there is a flag in the arithmetic
sequence object that tells us if every element of the range
can fit or not in an Int, or if it requires a Wide. So where
possible the Int objects are used in place to Wide ones.

- The patch was updated to reflect the new core changes
just commited by Miguel. Instead of five ad-hoc checks to
see if it's possible to update a variable directly without to
pass for the usual API, there is now a function to test for
this condition in a very clean way.

- Ranges with negative step are now fixed. In general the
algorithm that computes the length of the list is now better
and should be definitive at this point.

- There is now a test for the [range] command, and for
[foreach], [lindex], [llength] in relation to the new obj type.
Not in the patch, but attached as a different file here.

Should be all for now.

msofer added on 2004-10-25 02:00:57:
Logged In: YES 
user_id=148712

I'm not sure I agree. The change should definitely be
documented, but:

As every other Tcl_List* function converts to listType, this
can only impact on code that mucks directly with listObj
internals and doesn't call Tcl_ListObjGetElements() first.
Anybody doing that should RTFM and RTFS very carefully. For
code that uses the public APIs, everything should be safe.

antirez added on 2004-10-24 22:13:48:
Logged In: YES 
user_id=67765

Just another comment, I readed the patch carefully
and noticed something that may create problems.
Some code may assume that for instance calling
Tcl_ListObjLength() will always convert an object to
the List type, and then access its internal representation.
With Miguel's patch this is no longer true, and this may
create problems, so maybe to avoid to put the range's
stuff inside the List's type methods can be a good idea.

antirez added on 2004-10-24 21:50:05:
Logged In: YES 
user_id=67765

I just read Miguel's patch. Very great! A lot of issues
resolved, this patch is better in every way and resolves
some bad thing of my patch, notably that the object
needed to take a reference to a wide object.
Thanks for the good work.

msofer added on 2004-10-24 20:50:11:

File Added - 106266: msrange.patch

Logged In: YES 
user_id=148712

Alternative implementation attached (msrange.patch), for
discussion. It is based heavily on antirez's patch.

The main differences are:
  - the new arithSeriesType are not meant to be exposed to
the user; they are used internally as optimisations by some
list commands (foreach, llength, lindex; lrange could be
added), with automatic conversion to list type for the
others. There are no special commands for handling the new
types. 
  - the new obj types do not store Tcl_Objs, but ints or
wideInts according to necessity. These are stored in a union
(alternatively one could consider having two different types
for ints and wides).
  - no new commands are meant to be exposed; this patch does
not touch the stubs tables

antirez added on 2004-10-24 16:34:46:

File Deleted - 106185:

antirez added on 2004-10-24 16:34:19:

File Added - 106250: range3.patch

Logged In: YES 
user_id=67765

This new patch fixes two problems: the cached object
string representation now is invalidated in the TEBC code,
and there is a check to be sure the cached object is still
of the wide int, otherwise a new object is created.

antirez added on 2004-10-24 16:34:16:
Logged In: YES 
user_id=67765

This new patch fixes two problems: the cached object
string representation now is invalidated in the TEBC code,
and there is a check to be sure the cached object is still
of the wide int, otherwise a new object is created.

antirez added on 2004-10-24 02:07:29:

File Added - 106185: range2.patch

Logged In: YES 
user_id=67765

This is a new version of the patch.

changes:

Dup method more sane semantically, should be the same
in the pratice but now the cached object is copied exactly.

A very important fix in the TEBC about I bug I introduced
with may changes, thanks to Miguel for discovering it.

Fixed detection of infinite ranges.

Ranges with more elements than the max list length now
reports an error like [lrepeat] does.

msofer added on 2004-10-23 22:27:57:
Logged In: YES 
user_id=148712

The patch uses wideInts for {start,end, step}, but is
restricted to a length of INTMAX elements - not this patch's
fault, Tcl lists have this restriction.
One possible performance impact is that the element variable
in [foreach] loops is always a wideInt. Whenever the element
is representable by an int, this may impose conversion costs
in further computations with it.

It may be interesting to study how to return an int object
whenever possible.  Mybe adding a
arithSeriesRepPtr->intObjPtr, and insuring that one of those
pointers is always valid, and the other is NULL? There may
be better solutions ...

msofer added on 2004-10-23 17:12:57:
Logged In: YES 
user_id=148712

Assigning to myself

antirez added on 2004-10-23 08:22:34:

File Added - 106124: range.patch

Attachments: