TIP 636: Expand List Flexibility with Abstract List Tcl_Obj Type

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Brian Griffin <[email protected]>
Author:         Ashok P. Nadkarni <[email protected]>
State:          Draft
Type:           Project
Vote:           Pending
Created:        1-Sep-2022
Tcl-Version:    8.7
Tcl-Branch:     abstractlist-with-625

Abstract

The most frustrating part of implementing custom Tcl_Obj types is that they often shimmer to a built-in Tcl type when performing typical Tcl operations on the value. This removes the advantages of having a custom type. The reason for having a custom type is usually due to either computational advantage, space advantage, or both. Maintaining this advantage is a worthy goal. Having access to built-in Tcl list operations makes the custom type even more worthwhile.

This TIP proposes to add a Tcl_Obj type, or possibly extend the Tcl_Obj type, to provide a means to define new concrete value types that behave like a List without specifically having an internal List storage implementation. Some or all of the script level List operations, such as lindex, llength, and foreach, would be supported for the new type without causing the obj to shimmer to the List type.

Specification

The Tcl_AbstractListType will provide a structure with a function tray that can be used to define a concrete type. The implementor of a new Abstract List type will provide its own internal representation along with a set of functions necessary to perform the particular actions needed for the corresponding List operation. The proposed functions are:

Tcl_Wideint LengthProc(obj)
Tcl_Obj *   IndexProc(obj,index)
Tcl_Obj *   SliceProc(obj,from,to)
Tcl_Obj *   ReverseProc(obj)
void        ToStringProc(obj)
int         GetElementsProc(interp,obj,start,count,objcPtr,objvPtr)
int         ReplaceProc(interp,obj,first,numToDelete,numToInsert,insertObjs)
int         SortElementsProc(interp,TBD)

The Length and Index functions are required for basic support. The rest are optional, however, most of these related List operations would result in the value shimmering to the List type if the function is not provided. The ToString and GetElements will have a fallback mechanism that rely on the Length and Index function.

Abstract List C API

The AbstractList C API calls are used internally by the List functions when the given listObj is an AbstractList type.

   Tcl_Obj *              Tcl_NewAbstractListObj(interp, abstractListType)
   Tcl_AbstractListType * Tcl_AbstractListGetType(objPtr)
   void                   Tcl_AbstractListSetConcreteRep(objPtr, repPtr)
   void *                 Tcl_AbstractListGetConcreteRep(objPtr)
   Tcl_WideInt            Tcl_AbstractListObjLength(objPtr)
   Tcl_Obj *              Tcl_AbstractListObjIndex(objPtr, index)
   Tcl_Obj *              Tcl_AbstractListObjRange(objPtr,fromIdx, toIdx)
   int                    Tcl_AbstractListObjReplace(interp,obj,first,
                              numToDelete,numToInsert,insertObjs)
   Tcl_Obj *              Tcl_AbstractListObjReverse(objPtr)
   int                    Tcl_AbstractListObjGetElements(interp,objPtr,
                              start,count,objcPtr,objvPtr)
   int                    Tcl_AbstractListObjSortElements(interp,TBD)

The Tcl_AbstractListType used by implementors to provide the necessary List capatability functions:

typedef struct Tcl_AbstractListType {
    Tcl_AbstractListVersion version;/* Structure version */
    const char *typeName;           /* Custom value reference */
    /* List emulation functions */
    Tcl_ALNewObjProc *newObjProc;       /* How to create a new Tcl_Obj of this
                                        ** custom type */
    Tcl_ALDupRepProc *dupRepProc;       /* How to duplicate a internal rep of this
                                        ** custom type */
    Tcl_ALLengthProc *lengthProc;       /* Return the [llength] of the
                                        ** AbstractList */
    Tcl_ALIndexProc *indexProc;         /* Return a value (Tcl_Obj) for
                                        ** [lindex $al $index] */
    Tcl_ALSliceProc *sliceProc;         /* Return an AbstractList for
                                        ** [lrange $al $start $end] */
    Tcl_ALReverseProc *reverseProc;     /* Return an AbstractList for
                                        ** [lreverse $al] */
    Tcl_ALGetElements *getElementsProc; /* Return an objv[] of all elements in
                                        ** the list */
    Tcl_ALReplaceProc *replaceProc;     /* Replace or remove subset of 
                                        ** elements. */
    Tcl_ALFreeConcreteRep *freeRepProc; /* Free ConcreteRep internals if
                                        ** necessary */
    Tcl_ALToStringRep *toStringProc;    /* Optimized "to-string" conversion
                                        ** for updating the string rep */
    Tcl_ALToListProc *toListProc;       /* Convert Abstract List
                                        ** to a List, presumably more 
                                        ** efficiently than creating an
                                        ** intermediate string rep */
    Tcl_ALSortProc *sortProc;           /* Sort list in a specified ordering. */
} Tcl_AbstractListType;

Requirements and Caveats

In all cases, the string representation for an Abstract List value must always be a valid List as defined by the Tcl language syntax and the list command. An Abstract List will always successfully "SetListFromAny" to a List type, memory limits aside. It is not a requirement that a string representation of a list be recognized by a particular Abstract List Type; there are some cases where this is not reasonably possible. For example, the ArithSeries, used in the lseq command, has only start, step, and count stored in the internal representation and uses math to compute indexed values. Taking a list of number and computing the start, step, and count and then validate, although possible, may not be worth the effort if vary large lists are involved.

Examples

A simple example of this concept is the new lseq command (TIP-629), which implements a custom ArithSeries type that contains 4 values that represent a list. The length and index values are computed on demand based on 3 of the 4 numeric values by simple arithmetic. It does not compute and fill a complete list up front. Shimmering is thus prevented in many cases. (see branch tip-629)

Other examples of possible Abstract Lists

It turns out that [dict] already implements special treatment within the foreach command in order to avoid shimmering of a dict to a list. It would be possible to avoid many, if not all, dict to list shimmering by using the Abstract List approach.

Other suggested examples:

Implementation

See branch abstractlist-with-625

Note: this is work that germinated from TIP-225. Salvatore Sanfilippo and Miguel Sofer should be recognized for their contribution.

Impact

On entry to the various list operations, both in the command implementation and in the bytecode engine, an initial test is made to see if the Obj is an AbstractList. A test is already made to see if the Obj is a List before calling SetListFromAny. The impact should be minimal. Note also that a similar test already exists in some limited cases for the Dict type.

It is conceivable that the traditional List implementation be (re)implemented as an AbstractList as well, then the additional overhead is simplified and minimized.

Alternative Implementation

The AbstractList struct could be merged with the Tcl_ObjType struct and simply be part of all types. This will need further investigation.