TIP 636: Expand List Flexibility with Abstract List Tcl_Obj Type

Login
Author:         Brian Griffin <[email protected]>
Author:         Ashok P. Nadkarni <[email protected]>
State:          Draft
Type:           Project
Vote:           Pending
Created:        1-Sep-2022
Tcl-Version:    9.1
Tcl-Branch:     tip-636-tcl9

Abstract

The most frustrating part of implementing custom Tcl_Obj types is that they often shimmer to a built-in Tcl type when performing typical Tcl operations on the value. This removes the advantages of having a custom type. The reason for having a custom type is usually due to either computational advantage, space advantage, or both. Maintaining this advantage is a worthy goal. Having access to built-in Tcl list operations makes the custom type even more worthwhile.

This TIP proposes to extend the Tcl_Obj type by adding functions that implement List operations on the given value. The functions mimic the List command behaviors without changing the Obj Type representation. By providing these functions, some or all of the script level List operations, such as lindex, llength, and foreach, would be supported for a custom type without causing the obj value to shimmer to the List type.

Here is a simple example of how this works. A simple scalar value, like a double numeric has an internal representation of "double". However, if the value is passed to [llength] command, the type will change to a List type:

    % set n [expr {15.3}]
    15.3
    % representation $n
    value is a double with a refcount of 2, object pointer at
    0x7fa3b512adc0, internal representation 0x402e99999999999a:0x0,
    string representation "15.3"

    % llength $n
    1
    % representation $n
    value is a list with a refcount of 2, object pointer at
    0x7fa3b512adc0, internal representation 0x7fa3b5864a50:0x0, string
    representation "15.3"

If the "double" type adds a lengthProc function to the type definition, then the [llength] command will correctly return "1" without having to shimmer the value to a "list".

Specification

This Tcl_ObjType is extended with slots for the following function pointers:

/* Abstract List functions */
Tcl_Size    (*lengthProc)(obj)
int         (*indexProc)(interp, obj, index, elemObjPtr)
int         (*sliceProc)(interp, obj, from, to, newObjPtr)
int         (*reverseProc)(interp, obj, newObjPtr)
int         (*getElementsProc)(interp, obj, start, count, objcPtr, objvPtr)
int         (*replaceProc)(interp, obj, first, numToDelete, numToInsert, insertObjs)
Tcl_Obj*    (*setElementProc)(interp, listPtr, indicies, valueObj)
int         (*getDblProc)(interp, obj, doublePtr)

These functions are optional and when absent, the List operation will revert to the base List operation behavior. For example, if the reverseProc is NULL, the [lreverse] operation will first shimmer the value to a List, then perform the operation.

The lengthProc is necessary if any other List function is also provided. The indexProc function is also used in some fallback operations.

The getDblProc function is not a List operation; it is used by Tcl_GetDoubleFromObj(). This optional function is intended only for cases where an alternate numeric representation is used for the custom type, and shimmering of this scalar value is undesired. (This is the case in VecTcl, for example.)

Hopefully the definition of each of these functions is clear by there respecive names.

TIP-644

Tip-644 added a version field to the Tcl_ObjType structure. In that proposal it was suggested that additional fields could be added individually, or as a struct. I propose that any elements added be done so individually, and that the version be incremented when changes are made, in the same paradigm used by the Tcl_ChannelType. This would mean that for a given Tcl release version, there is only one allowed version of the Tcl_ObjType.

Using structs adds complexity that I am not sure is helpful. Using struct size for the version, as has been done in some branch implementations of Abstract Lists, is problematic when future changes may have the same size as older versions.

Requirements and Caveats

In all cases, the string representation for an Abstract List value must always be a valid List as defined by the Tcl language syntax and the [list] command. An Abstract List will always successfully "SetListFromAny" to a List type, memory limits aside. It is not a requirement that a string representation of a list be recognized by a particular Abstract List Type; there are some cases where this is not reasonably possible. For example, the ArithSeries, used in the lseq command, has only start, step, and count stored in the internal representation and uses math to compute indexed values. Taking a list of number and computing the start, step, and count and then validate, although possible, may not be worth the effort if vary large lists are involved.

Examples

An existing example of this concept is the new lseq command (TIP-629), which implements a custom ArithSeries type that contains 4 values that represent a list. The length and index values are computed on demand based on 3 of the 4 numeric values by simple arithmetic. It does not compute and fill a complete list up front. Shimmering is thus prevented in many cases. See branch tip-636-tcl9 which (re)implements the ArithSeries using AbstractLists. Branch tip-636-tcl9-644 implements the in-core approach.

Other examples of possible Abstract Lists

It turns out that [dict] already implements special treatment within the foreach command in order to avoid shimmering of a dict to a list. It would be possible to avoid many, if not all, dict to list shimmering by using the Abstract List approach.

Other suggested examples:

Implementation

See branch tip-636-tcl9, and tip-636-tcl9-644

Alternatives

There is an "internal only" implementation on branch internal-abstract-list

Impact

On entry to the various list operations, both in the command implementation and in the bytecode engine, an initial test is made to see if the Obj is an AbstractList. A test is already made to see if the Obj is a List before calling SetListFromAny. The impact should be minimal. Note also that a similar test already exists in some limited cases for the Dict type.

It is conceivable that the traditional List implementation be (re)implemented as an AbstractList as well, then the additional overhead is simplified and minimized.

Acknowledgment

Note: this is work that germinated from TIP-225. Salvatore Sanfilippo and Miguel Sofer should be recognized for their contribution.