Ticket UUID: | 1052584 | |||
Title: | TIP#225: O(1) space [range] command. | |||
Type: | Patch | Version: | TIP Implementation | |
Submitter: | antirez | Created on: | 2004-10-23 01:22:33 | |
Subsystem: | 14. List Object | Assigned To: | ferrieux | |
Priority: | 5 Medium | Severity: | ||
Status: | Open | Last Modified: | 2008-07-30 17:16:20 | |
Resolution: | None | Closed By: | ||
Closed on: | ||||
Description: |
That's a first version of the [range] command working in O(1) space. There is some minor bug I'll fix tomorrow, but the basic stuff appears to be ok. commands aware of the new arithSeries object in the TEBC and normal fashion: llength lindex foreach New commands: Only [range]. The change is minimal from the user point of view. From the core side is a bit more tricky because of the O(1) space aims. Other stuff optimized for this object: SetListObjFromAny will directly convert ranges into lists without to pass for the string repr. Open issues: The code needs a check in order to don't allow the generation of ranges that are longer than the max listObj length supported by the current Tcl code. For the rest, I used Wide Ints in order to ensure that ranges shorter than the max list length, but having as elements Wide ints will work. Performance changes: The effects on the performances of the list-oriented should be very small, there is some test more than before. The case of: foreach x [range 0 $n] is faster than the equivalent [for] form by 10% to 30% from a first test. The appended patch should apply against HEAD. | |||
User Comments: |
ferrieux added on 2008-07-30 17:16:20:
Logged In: YES user_id=496139 Originator: NO Hey thanks Miguel for assigning this to me :-) A nearly 4-year-old TIP... Salvatore, and Miguel are you really still backing it ? The reason I'm asking this, is that: (1) All performance measurements should at least be updated to current Tcl (2) The List object type is so ubiquitous in the core that any extension of it should be carefully weighed against the increased complexity of tidying up its API in the future. I'm making this remark in the light of my recent attempt at Mutability and also my Cons proposal. (3) What exactly is the aim: just beat [for] by 30%, or open the way to "lazy lists", or generators seamlessly integrated with lists ? In other words, instead of hard-wiring just arithmetic series, shouldn't we rather work a little bit more to make Tcl_ListObjType polymorphic once for all ? Such a generic [lindex/lllength] hook in the core would instantly bring [range] and [cons] as extensions... antirez added on 2004-11-02 08:00:15: File Added - 107307: range.n antirez added on 2004-11-02 07:59:40: Logged In: YES user_id=67765 Now there is a man page for the command that is attached here. It's possible to view it via web at http://www.invece.org/rangeman.html. Many thanks to Andreas Kupries that found a number of bugs in the man page (now should be ok). antirez added on 2004-10-26 05:00:13: File Added - 106451: range6.patch antirez added on 2004-10-26 05:00:12: Logged In: YES user_id=67765 new patch that imports from the Miguel's patch the idea to avoid wide math when possible. The arithSeries object have start/end/step fields as unions now, the check to see if the range stores an int range or a wide one was already there to return the right type of Tcl object so this is always a win. Actually I benchmarked this patch that shows a non-trivial performance gain. There are not semantical changes in this patch, should behave the same as patch5. antirez added on 2004-10-25 15:09:06: File Added - 106350: range.test.tcl Logged In: YES user_id=67765 [range] test attached. antirez added on 2004-10-25 15:08:33: Logged In: YES user_id=67765 [range] test attached. antirez added on 2004-10-25 15:08:30: Logged In: YES user_id=67765 [range] test attached. antirez added on 2004-10-25 15:07:59: File Added - 106349: range5.patch Logged In: YES user_id=67765 New patch attached (range5.patch). The idea is to merge what I think are the best points of Miguel's and my patch, there are also random bugfixes. Description of the current structure: - The arithSeries object is a standalone object, there is no code about it in the List object metod, but setListFromAny knows how to convert an airthmetic sequence to a list without to pass for the string repr. - The commands lindex, llength and foreach knows about the arithmetic series object in order to avoid to convert into a list. - Like in the Miguel's patch, there is no longer a reference to a wide object inside the range, this was possible modifing the code in TEBC using the Miguel's implementation idea. - Like in Miguel's patch now there is a flag in the arithmetic sequence object that tells us if every element of the range can fit or not in an Int, or if it requires a Wide. So where possible the Int objects are used in place to Wide ones. - The patch was updated to reflect the new core changes just commited by Miguel. Instead of five ad-hoc checks to see if it's possible to update a variable directly without to pass for the usual API, there is now a function to test for this condition in a very clean way. - Ranges with negative step are now fixed. In general the algorithm that computes the length of the list is now better and should be definitive at this point. - There is now a test for the [range] command, and for [foreach], [lindex], [llength] in relation to the new obj type. Not in the patch, but attached as a different file here. Should be all for now. msofer added on 2004-10-25 02:00:57: Logged In: YES user_id=148712 I'm not sure I agree. The change should definitely be documented, but: As every other Tcl_List* function converts to listType, this can only impact on code that mucks directly with listObj internals and doesn't call Tcl_ListObjGetElements() first. Anybody doing that should RTFM and RTFS very carefully. For code that uses the public APIs, everything should be safe. antirez added on 2004-10-24 22:13:48: Logged In: YES user_id=67765 Just another comment, I readed the patch carefully and noticed something that may create problems. Some code may assume that for instance calling Tcl_ListObjLength() will always convert an object to the List type, and then access its internal representation. With Miguel's patch this is no longer true, and this may create problems, so maybe to avoid to put the range's stuff inside the List's type methods can be a good idea. antirez added on 2004-10-24 21:50:05: Logged In: YES user_id=67765 I just read Miguel's patch. Very great! A lot of issues resolved, this patch is better in every way and resolves some bad thing of my patch, notably that the object needed to take a reference to a wide object. Thanks for the good work. msofer added on 2004-10-24 20:50:11: File Added - 106266: msrange.patch Logged In: YES user_id=148712 Alternative implementation attached (msrange.patch), for discussion. It is based heavily on antirez's patch. The main differences are: - the new arithSeriesType are not meant to be exposed to the user; they are used internally as optimisations by some list commands (foreach, llength, lindex; lrange could be added), with automatic conversion to list type for the others. There are no special commands for handling the new types. - the new obj types do not store Tcl_Objs, but ints or wideInts according to necessity. These are stored in a union (alternatively one could consider having two different types for ints and wides). - no new commands are meant to be exposed; this patch does not touch the stubs tables antirez added on 2004-10-24 16:34:46: File Deleted - 106185: antirez added on 2004-10-24 16:34:19: File Added - 106250: range3.patch Logged In: YES user_id=67765 This new patch fixes two problems: the cached object string representation now is invalidated in the TEBC code, and there is a check to be sure the cached object is still of the wide int, otherwise a new object is created. antirez added on 2004-10-24 16:34:16: Logged In: YES user_id=67765 This new patch fixes two problems: the cached object string representation now is invalidated in the TEBC code, and there is a check to be sure the cached object is still of the wide int, otherwise a new object is created. antirez added on 2004-10-24 02:07:29: File Added - 106185: range2.patch Logged In: YES user_id=67765 This is a new version of the patch. changes: Dup method more sane semantically, should be the same in the pratice but now the cached object is copied exactly. A very important fix in the TEBC about I bug I introduced with may changes, thanks to Miguel for discovering it. Fixed detection of infinite ranges. Ranges with more elements than the max list length now reports an error like [lrepeat] does. msofer added on 2004-10-23 22:27:57: Logged In: YES user_id=148712 The patch uses wideInts for {start,end, step}, but is restricted to a length of INTMAX elements - not this patch's fault, Tcl lists have this restriction. One possible performance impact is that the element variable in [foreach] loops is always a wideInt. Whenever the element is representable by an int, this may impose conversion costs in further computations with it. It may be interesting to study how to return an int object whenever possible. Mybe adding a arithSeriesRepPtr->intObjPtr, and insuring that one of those pointers is always valid, and the other is NULL? There may be better solutions ... msofer added on 2004-10-23 17:12:57: Logged In: YES user_id=148712 Assigning to myself antirez added on 2004-10-23 08:22:34: File Added - 106124: range.patch |
Attachments:
- range.n [download] added by antirez on 2004-11-02 08:00:15. [details]
- range6.patch [download] added by antirez on 2004-10-26 05:00:13. [details]
- range.test.tcl [download] added by antirez on 2004-10-25 15:09:06. [details]
- range5.patch [download] added by antirez on 2004-10-25 15:07:59. [details]
- msrange.patch [download] added by msofer on 2004-10-24 20:50:11. [details]
- range3.patch [download] added by antirez on 2004-10-24 16:34:19. [details]
- range.patch [download] added by antirez on 2004-10-23 08:22:34. [details]