Author: Florian Murr <[email protected]>
State: Draft
Type: Project
Vote: Pending
Created: 02-Aug-2017
Post-History:
Keywords: Tcl,data structure
Tcl-Version: 9.1
Implementation-URL: https://core.tcl-lang.org/tips/doc/main/attach/513/agendas.tcl
Abstract
This proposes new commands for Tcl to support efficient dynamically-changing data structures.
Proposal
The word 'agenda' has been chosen for a highly dynamic collection of items,
i.e. the collection is likely to change all the time - “pulsating”. (To
contrast the concept of an agenda with something that is not, think of array
startseach'
+ array nextelement
which abort the operation the very moment
any change to the array happens.) Depending on the type of agenda, it has a
built in order, or not.
Such agendas crop up frequently in practice and Tcl should support them with best performance and readability. All three, better performance, better readability and more expressivenes can be achieved simultaneously, as shown below. Most are familiar with agenda-types like “Stack” and “Queue”, which can be implemented as a Tcl-list, but sometimes an array or dictionary is more appropriate. (For an example consider Huet's unification algorithm)
The most important methods for an agenda are:
- check for (non-)emptiness of the agenda
- put an item on the agenda
- get an item from the agenda and optionally remove it
All these operations should be O(1) and/or have maximum C-level performance.
This TIP proposes 3 functions to achieve this goal, with a synopsis close to this (details below):
- array haselem ?-remove? arrVar ?keyVar? ?valueVar?
- dict haselem ?-remove? dictVar ?keyVar? ?valueVar?
- lhaselem ?options? listVar ?valueVar?
Each of these functions checks the existence, gets the element and optionally removes it in one step. The various advantages are examined below.
Readability Advantages
With lhaselem and using lappend to put an item on the agenda (in a Tcl list in a variable), we can now process the elements in a Stack-agenda stack in one single line:
while {[lhaselem -remove -index end stack item]} {
# use $item, lappend, ...
}
Likewise a Queue-agenda requires just a single line:
while {[lhaselem -remove -index 0 queue item]} {
# use $item, lappend, ...
}
Processing an array-agenda or dict-agenda is just as easy:
while {[array haselem -remove arr key val]} {
# use $key, $val, set, unset, ...
}
These one-liners improve readability because their semantics is not scattered upon multiple functions, that access the data multiple times and must be mentally put together to understand it. Their semantics is immediately obvious, once the new functions are known. To value how much readability improves, consider the “payload” section in each of the prototypical implementations (linked below) and compare that to the respective one-liner.
Expressiveness Advantages
array haselem. Here we must pause and consider that we have a certain semantic gap with dynamic arrays, i.e. array-agendas: Checking for (non-)emptiness is currently not possible with a simple O(1) operation. The obvious candidate
if {[array size arr]}
is an O(N) operation. The alternative is usingarray startseach
,array nextelement
+array donesearch
just to check whether an array is empty or not, but that is not appealing either. These functions are useful for more static collections, not for agendas. Also, typically for array-agendas cropping up in practice is the requirement for “anonymous access”, i.e., we need to get an element of the collection without having to know beforehand which specific element to ask for. The only direct access to array elements is currently$arr($elem)
, but this is not anonymous, because you have to know$elem
beforehand. With array haselem we gain more expressiveness, because now we can check (non-)emptiness with a single O(1) function call and we can now anonymously access an item in the collection.dict haselem. With dicts it is not quite as bad as with arrays, but still we have to employ
dict for
to gain anonymous access to an element. Anddict size
might also not be the best we can do to check for (non-)emptyness, despite being an O(1) operation.lhaselem. (see the Stack and Queue example above)
Anyway, to have a single “atomic” operation that checks the existence, gets the item, and optionally removes it from the collection, is a new expressive feature.
Performance Advantages
array haselem. At the C-level we do not need even a single hash-access to get an element: just pick the first one in the first bucket. And removing it can be done immediately, while we are at it. In contrast, when using current Tcl functions to get an value and remove the key, we need at least
if {[info exists arr]}
to check existence of the array,array nextelement
to get the key,$arr($key)
to get the value, andunset arr($key)
to remove it. These employ 2 or 3 hash-accesses to get the job done, when in fact we need none.dict haselem. (c.f. “payload” section in
dHaselem
in the sample code). We need to access the dict only once to check and remove the item, not twice.lhaselem. (c.f. “payload” section in
lhaselem
in the sample code). We need to access the list only once to check and remove the item, not twice.
Prototype Implementation
The prototype implementation has the desired semantics, but does not even come close to the performance desired. These functions have to be implemented in C and take advantage of the internal representation to fulfill the proposed performance gains.
Copyright
This document has been placed in the public domain.