Motivation
The primary purpose of the changes is memory efficiency for large lists. As a side benefit, affected list operations are now constant time though at a minor cost in indexed access. This document quantifies the tradeoffs in order to justify the proposal.
The following new abstract list Tcl_Obj
types are added.
repeatedList
- holds repeated elements (lrepeat
)reversedList
- holds elements of an existing list in reverse order (lreverse
)rangeList
- holds elements of a range within an existing list (lrange
)
All these share the following characteristics.
Benefits:
Most important, the resulting list is memory efficient, either through sharing memory with the source list or a more efficient internal representation. This makes certain list operations on large lists feasible that were previously impractical.
The associated operations (
lrepeat
forrepeatedList
,lreverse
forreversedList
etc.) are very fast (orders of magnitude for even moderately long lists).There are likely other benefits for large lists that are difficult to measure like minimizing cache use, swapping and the cost of freeing the additional memory.
Drawbacks:
Indexing and iterating an abstract list is less efficient, roughly by 4-10%.
Unlike the existing arithseries
, which has a higher overhead due to additional
memory allocation of the indexed element, here the efficiency loss is due to not
having direct support in the byte code execution engine. In all likelihood, this
can be eliminated by modifying the BCE if so deemed necessary for acceptance.
For simplicity, the current implementation does not modify the BCE but instead
only uses the abstract list representations for lengths greater than a threshold
(100 at the time of writing) where the memory savings is seen as substantial.
Caveats:
All abstract lists, including the existing arithseries
(lseq
) are only memory
efficient as long they are not modified. Any write operation shimmers them into
a non-abstract list so for such cases abstract lists essentially do "lazy"
shimmering, postponing the time and memory cost of modification to the
actual modification. Read operations, which include iteration, are however
frequent enough to be worth optimizing.
Metrics
All timings in the tables below are in microseconds as measured by timerate
.
Memory growth is the change in memory after a test run. Thus for both, smaller
values are better.
The repeatedList
abstract list
Generated by the lrepeat
command. Comparison with 9.0.0 in memory use and
performance in shown in the table below.
Memory growth number is of course driven by the largest repeat count, 1000000 in the table below, and assumes a single repeated element. However, corresponding benefits are seen (not in table below) for fewer counts and multiple repeated elements.
Unlike 9.0.0 which is linear with the repeat count, the time cost of the
lrepeat
operation is a small constant and is significantly faster even for a few hundred elements.The time cost of indexing and iteration is higher by about 7-8% and 3-4% respectively.
9.0.0 |apn-tip636-appl
---------------------------------------------------------------|---------------
Memory growth : 16612 KB | 144 KB
--
(len=10) {lrepeat $len x} : 0.131 | 0.135
(len=99) {lrepeat $len x} : 0.323 | 0.314
(len=101) {lrepeat $len x} : 0.322 | 0.113
(len=1000) {lrepeat $len x} : 1.894 | 0.113
(len=10000) {lrepeat $len x} : 18.327 | 0.113
(len=100000) {lrepeat $len x} : 183.535 | 0.113
(len=1000000) {lrepeat $len x} : 2107.770 | 0.113
(len=10) {lindex $l $ix} : 0.100 | 0.099
(len=99) {lindex $l $ix} : 0.100 | 0.100
(len=101) {lindex $l $ix} : 0.100 | 0.109
(len=1000) {lindex $l $ix} : 0.099 | 0.108
(len=10000) {lindex $l $ix} : 0.101 | 0.108
(len=100000) {lindex $l $ix} : 0.099 | 0.109
(len=1000000) {lindex $l $ix} : 0.099 | 0.107
(len=10) {foreach v $l {}} : 1.974 | 1.935
(len=99) {foreach v $l {}} : 16.650 | 16.401
(len=101) {foreach v $l {}} : 17.000 | 17.705
(len=1000) {foreach v $l {}} : 165.182 | 170.577
(len=10000) {foreach v $l {}} : 1645.150 | 1698.710
(len=100000) {foreach v $l {}} : 16556.300 | 17017.000
(len=1000000) {foreach v $l {}} : 164621.300 | 169603.200
The reversedList
abstract list:
Generated by the lreverse
command by sharing the memory in the source list.
Comparison with 9.0.0 in memory use and performance in shown in the table below.
Memory growth is independent of list size since the original source memory is shared.
Time of the
lreverse
operations is again a small constant and not linear in list size as in 9.0.0.Indexing and iteration incurs a penalty of about 6-7% and 3-4% respectively.
9.0.0 |apn-tip636-appl
---------------------------------------------------------------|---------------
Memory growth : 8800 KB | 144 KB
--
(len=10) {lreverse $l} : 0.177 | 0.178
(len=99) {lreverse $l} : 0.444 | 0.469
(len=101) {lreverse $l} : 0.458 | 0.133
(len=1000) {lreverse $l} : 3.086 | 0.132
(len=10000) {lreverse $l} : 29.436 | 0.132
(len=100000) {lreverse $l} : 291.879 | 0.132
(len=1000000) {lreverse $l} : 3263.890 | 0.133
(len=10) {lindex $l $ix} : 0.100 | 0.100
(len=99) {lindex $l $ix} : 0.101 | 0.100
(len=101) {lindex $l $ix} : 0.101 | 0.107
(len=1000) {lindex $l $ix} : 0.100 | 0.106
(len=10000) {lindex $l $ix} : 0.100 | 0.107
(len=100000) {lindex $l $ix} : 0.100 | 0.110
(len=1000000) {lindex $l $ix} : 0.100 | 0.107
(len=10) {foreach v $l {}} : 1.988 | 1.949
(len=99) {foreach v $l {}} : 17.060 | 16.409
(len=101) {foreach v $l {}} : 17.007 | 17.363
(len=1000) {foreach v $l {}} : 166.382 | 168.623
(len=10000) {foreach v $l {}} : 1651.030 | 1687.490
(len=100000) {foreach v $l {}} : 16529.900 | 16880.700
(len=1000000) {foreach v $l {}} : 165598.600 | 169084.200
The rangeList
abstract list:
The purpose of rangeList
differs from the other abstract list types above.
9.0.0 already optimizes the range operation for both existing list types,
list
and arithseries
so as seen in the table below, comparisons fall
into statistical noise.
9.0.0 |apn-tip636-appl
---------------------------------------------------------------|---------------
Memory growth : 144 KB | 144 KB
--
(len=10) {lrange $l 1 end-1} : 0.132 | 0.142
(len=99) {lrange $l 1 end-1} : 0.410 | 0.419
(len=101) {lrange $l 1 end-1} : 0.417 | 0.428
(len=1000) {lrange $l 1 end-1} : 0.120 | 0.124
(len=10000) {lrange $l 1 end-1} : 0.118 | 0.124
(len=100000) {lrange $l 1 end-1} : 0.118 | 0.124
(len=1000000) {lrange $l 1 end-1} : 0.117 | 0.123
(len=10) {lindex $l $ix} : 0.100 | 0.101
(len=99) {lindex $l $ix} : 0.100 | 0.101
(len=101) {lindex $l $ix} : 0.101 | 0.101
(len=1000) {lindex $l $ix} : 0.101 | 0.102
(len=10000) {lindex $l $ix} : 0.101 | 0.102
(len=100000) {lindex $l $ix} : 0.101 | 0.102
(len=1000000) {lindex $l $ix} : 0.100 | 0.102
(len=10) {foreach v $l {}} : 1.618 | 1.633
(len=99) {foreach v $l {}} : 16.439 | 16.110
(len=101) {foreach v $l {}} : 16.724 | 16.544
(len=1000) {foreach v $l {}} : 164.658 | 161.976
(len=10000) {foreach v $l {}} : 1656.320 | 1617.200
(len=100000) {foreach v $l {}} : 16446.800 | 16141.000
(len=1000000) {foreach v $l {}} : 164901.700 | 161559.800
The motivation for rangeList
is to prevent unnecessary shimmering of the
newly added repeatedList
and reversedList
types as well as abstract lists
implemented in extensions. Since these do not exist in 9.0.0, the question
of comparison does not arise. However, the performance of range operations
on the new abstract list types is also a consideration. The corresponding
table is shown below.
As seen, range operations on abstract lists are memory efficient but incur about a 15% cost on indexing and 7-8% cost on iteration compared to range operations on non-abstract lists.
apn-tip636-appl
------------------------------------------------------------------
Memory growth : 144 KB
--
(len=10) {lrange $l 1 end-1} : 0.141
(len=99) {lrange $l 1 end-1} : 0.425
(len=101) {lrange $l 1 end-1} : 0.431
(len=1000) {lrange $l 1 end-1} : 0.118
(len=10000) {lrange $l 1 end-1} : 0.125
(len=100000) {lrange $l 1 end-1} : 0.117
(len=1000000) {lrange $l 1 end-1} : 0.117
(len=10) {lindex $l $ix} : 0.101
(len=99) {lindex $l $ix} : 0.100
(len=101) {lindex $l $ix} : 0.101
(len=1000) {lindex $l $ix} : 0.115
(len=10000) {lindex $l $ix} : 0.116
(len=100000) {lindex $l $ix} : 0.118
(len=1000000) {lindex $l $ix} : 0.125
(len=10) {foreach v $l {}} : 1.611
(len=99) {foreach v $l {}} : 16.084
(len=101) {foreach v $l {}} : 16.487
(len=1000) {foreach v $l {}} : 178.548
(len=10000) {foreach v $l {}} : 1770.350
(len=100000) {foreach v $l {}} : 19182.900
(len=1000000) {foreach v $l {}} : 177549.800
Status
The apn-tip636-appl
branch contains the implementation of the proposal.
The listTypes.test
file contains the test cases to exercise the three new
internal list types as well as the existing ones. gcov
shows 91% code
and 100% function coverage of the new implementation. Missing coverage is
for "panic blocks" and validation which is not exercised as it is already
done by upper callers.