TIP 513: Better support for 'agendas' as arrays, dictionaries or lists

Login
Author:         Florian Murr <[email protected]>
State:          Draft
Type:           Project
Vote:           Pending
Created:        02-Aug-2017
Post-History:   
Keywords:       Tcl,data structure
Tcl-Version:	9.1
Implementation-URL: https://core.tcl-lang.org/tips/doc/main/attach/513/agendas.tcl

Abstract

This proposes new commands for Tcl to support efficient dynamically-changing data structures.

Proposal

The word 'agenda' has been chosen for a highly dynamic collection of items, i.e. the collection is likely to change all the time - “pulsating”. (To contrast the concept of an agenda with something that is not, think of array startseach'+ array nextelement which abort the operation the very moment any change to the array happens.) Depending on the type of agenda, it has a built in order, or not.

Such agendas crop up frequently in practice and Tcl should support them with best performance and readability. All three, better performance, better readability and more expressivenes can be achieved simultaneously, as shown below. Most are familiar with agenda-types like “Stack” and “Queue”, which can be implemented as a Tcl-list, but sometimes an array or dictionary is more appropriate. (For an example consider Huet's unification algorithm)

The most important methods for an agenda are:

  1. check for (non-)emptiness of the agenda
  2. put an item on the agenda
  3. get an item from the agenda and optionally remove it

All these operations should be O(1) and/or have maximum C-level performance.

This TIP proposes 3 functions to achieve this goal, with a synopsis close to this (details below):

  1. array haselem ?-remove? arrVar ?keyVar? ?valueVar?
  2. dict haselem ?-remove? dictVar ?keyVar? ?valueVar?
  3. lhaselem ?options? listVar ?valueVar?

Each of these functions checks the existence, gets the element and optionally removes it in one step. The various advantages are examined below.

Readability Advantages

With lhaselem and using lappend to put an item on the agenda (in a Tcl list in a variable), we can now process the elements in a Stack-agenda stack in one single line:

while {[lhaselem -remove -index end stack item]} {
    # use $item, lappend, ...
}

Likewise a Queue-agenda requires just a single line:

while {[lhaselem -remove -index 0 queue item]} {
    # use $item, lappend, ...
}

Processing an array-agenda or dict-agenda is just as easy:

while {[array haselem -remove arr key val]} {
    # use $key, $val, set, unset, ...
}

These one-liners improve readability because their semantics is not scattered upon multiple functions, that access the data multiple times and must be mentally put together to understand it. Their semantics is immediately obvious, once the new functions are known. To value how much readability improves, consider the “payload” section in each of the prototypical implementations (linked below) and compare that to the respective one-liner.

Expressiveness Advantages

  1. array haselem. Here we must pause and consider that we have a certain semantic gap with dynamic arrays, i.e. array-agendas: Checking for (non-)emptiness is currently not possible with a simple O(1) operation. The obvious candidate if {[array size arr]} is an O(N) operation. The alternative is using array startseach, array nextelement + array donesearch just to check whether an array is empty or not, but that is not appealing either. These functions are useful for more static collections, not for agendas. Also, typically for array-agendas cropping up in practice is the requirement for “anonymous access”, i.e., we need to get an element of the collection without having to know beforehand which specific element to ask for. The only direct access to array elements is currently $arr($elem), but this is not anonymous, because you have to know $elem beforehand. With array haselem we gain more expressiveness, because now we can check (non-)emptiness with a single O(1) function call and we can now anonymously access an item in the collection.

  2. dict haselem. With dicts it is not quite as bad as with arrays, but still we have to employ dict for to gain anonymous access to an element. And dict size might also not be the best we can do to check for (non-)emptyness, despite being an O(1) operation.

  3. lhaselem. (see the Stack and Queue example above)

Anyway, to have a single “atomic” operation that checks the existence, gets the item, and optionally removes it from the collection, is a new expressive feature.

Performance Advantages

  1. array haselem. At the C-level we do not need even a single hash-access to get an element: just pick the first one in the first bucket. And removing it can be done immediately, while we are at it. In contrast, when using current Tcl functions to get an value and remove the key, we need at least if {[info exists arr]} to check existence of the array, array nextelement to get the key, $arr($key) to get the value, and unset arr($key) to remove it. These employ 2 or 3 hash-accesses to get the job done, when in fact we need none.

  2. dict haselem. (c.f. “payload” section in dHaselem in the sample code). We need to access the dict only once to check and remove the item, not twice.

  3. lhaselem. (c.f. “payload” section in lhaselem in the sample code). We need to access the list only once to check and remove the item, not twice.

Prototype Implementation

The prototype implementation has the desired semantics, but does not even come close to the performance desired. These functions have to be implemented in C and take advantage of the internal representation to fulfill the proposed performance gains.

Copyright

This document has been placed in the public domain.