TIP 636: Expand List Flexibility with Abstract List Tcl_Obj Type

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Brian Griffin <[email protected]>
Author:         Ashok P. Nadkarni <[email protected]>
State:          Draft
Type:           Project
Vote:           Pending
Created:        1-Sep-2022
Tcl-Version:    9.1
Tcl-Branch:     tip-636-tcl9

Abstract

The most frustrating part of implementing custom Tcl_Obj types is that they often shimmer to a built-in Tcl type when performing typical Tcl operations on the value. This removes the advantages of having a custom type. The reason for having a custom type is usually due to either computational advantage, space advantage, or both. Maintaining this advantage is a worthy goal. Having access to built-in Tcl list operations makes the custom type even more worthwhile.

This TIP proposes to add a Tcl_Obj type, or possibly extend the Tcl_Obj type (see TIP-644), to provide a means to define new concrete value types that behave like a List without specifically having an internal List storage implementation. Some or all of the script level List operations, such as lindex, llength, and foreach, would be supported for the new type without causing the obj to shimmer to the List type.

Specification

In order to prevent a values' type from being converted to a List type, custom functions are need to perform the List operations using the existing internal representation of the value. There are 2 possible ways to support this customized behavior for an arbitrary type. The custom Tcl_AbstractList type provides a structure with a function tray that can be used to define a new concrete type. This is implemented as a wrapper around the Tcl_ObjTyp that provides additional capability. The implementor of a new Abstract List type will provide its own internal representation along with a set of functions necessary to perform the particular actions needed for the corresponding List operation. The proposed functions are:

Tcl_Obj*    DupRepProc()
Tcl_Wideint LengthProc(obj)
int         IndexProc(interp,obj,index,elemObjPtr)
int         SliceProc(interp,obj,from,to,newObjPtr)
int         ReverseProc(interp,obj,newObjPtr)
void        ToStringProc(obj)
void        freeRepProc()
int         GetElementsProc(interp,obj,start,count,objcPtr,objvPtr)
int         ReplaceProc(interp,obj,first,numToDelete,numToInsert,insertObjs)
Tcl_Obj*    SetElementProc (interp, listPtr, indicies, valueObj)

The Length and Index functions are required for basic support. The rest are optional, however, most of these related List operations would result in the value shimmering to the List type if the function is not provided. Some functions may have a fallback mechanism that rely on the Length and Index functions.

Abstract List C API

The AbstractList C API calls are used internally by the List functions when the given listObj is an AbstractList type.

Tcl_Obj *     Tcl_AbstractListObjNew(interp, abstractListType)
Tcl_AbstractListType *
              Tcl_AbstractListGetType(objPtr)
void          Tcl_AbstractListSetConcreteRep(objPtr, repPtr)
void *        Tcl_AbstractListGetConcreteRep(objPtr)
Tcl_WideInt   Tcl_AbstractListObjLength(objPtr)
Tcl_Obj *     Tcl_AbstractListObjIndex(objPtr, index)
Tcl_Obj *     Tcl_AbstractListObjRange(objPtr,fromIdx, toIdx)
int           Tcl_AbstractListObjReplace(interp,obj,first,
                     numToDelete,numToInsert,insertObjs)
Tcl_Obj *     Tcl_AbstractListObjReverse(objPtr)
int           Tcl_AbstractListObjGetElements(interp,objPtr,
                     start,count,objcPtr,objvPtr)
int           Tcl_AbstractListObjSortElements(interp,TBD)
Tcl_Obj *     Tcl_AbstractListObjCopy(interp, listPtr)
Tcl_Obj *     Tcl_AbstractListSetElement(interp, listPtr, 
                     indicies, valueObj)

The Tcl_AbstractListType struct used by implementors to provide the necessary List capatability functions:

typedef struct Tcl_AbstractListType {
    Tcl_AbstractListVersion version;/* Structure version */
    const char *typeName;           /* Custom value reference */
    /* List emulation functions */
    Tcl_ALNewObjProc *newObjProc;       /* How to create a new Tcl_Obj of this
                                        ** custom type */
    Tcl_ALDupRepProc *dupRepProc;       /* How to duplicate a internal rep of this
                                        ** custom type */
    Tcl_ALLengthProc *lengthProc;       /* Return the [llength] of the
                                        ** AbstractList */
    Tcl_ALIndexProc *indexProc;         /* Return a value (Tcl_Obj) for
                                        ** [lindex $al $index] */
    Tcl_ALSliceProc *sliceProc;         /* Return an AbstractList for
                                        ** [lrange $al $start $end] */
    Tcl_ALReverseProc *reverseProc;     /* Return an AbstractList for
                                        ** [lreverse $al] */
    Tcl_ALGetElements *getElementsProc; /* Return an objv[] of all elements in
                                        ** the list */
    Tcl_ALReplaceProc *replaceProc;     /* Replace or remove subset of 
                                        ** elements. */
    Tcl_ALFreeConcreteRep *freeRepProc; /* Free ConcreteRep internals if
                                        ** necessary */
    Tcl_ALToStringRep *toStringProc;    /* Optimized "to-string" conversion
                                        ** for updating the string rep */
    Tcl_ALToListProc *toListProc;       /* Convert Abstract List
                                        ** to a List, presumably more 
                                        ** efficiently than creating an
                                        ** intermediate string rep */
    Tcl_ALSetElement *setElementProc;   /* Replace the element at the
                                        ** index with the given valueObj */
    Tcl_ALSortProc *sortProc;           /* Sort list in a specified ordering. */
} Tcl_AbstractListType;

Alterative - Extending Tcl_ObjType

An alternative approach is to extend the exiting Tcl_ObjType with the additional List oriented function tray. This approach can be facilitated via TIP-644. The set of functions are essentially the same for either approach.

With this approach, the Tcl_AbstractList C API is not necessary. The "work" done by this API would be handled directly within the Core. Custom types would only need to supply the necessary list functions as part of the Tcl_ObjType definition.

Requirements and Caveats

In all cases, the string representation for an Abstract List value must always be a valid List as defined by the Tcl language syntax and the list command. An Abstract List will always successfully "SetListFromAny" to a List type, memory limits aside. It is not a requirement that a string representation of a list be recognized by a particular Abstract List Type; there are some cases where this is not reasonably possible. For example, the ArithSeries, used in the lseq command, has only start, step, and count stored in the internal representation and uses math to compute indexed values. Taking a list of number and computing the start, step, and count and then validate, although possible, may not be worth the effort if vary large lists are involved.

Examples

A simple example of this concept is the new lseq command (TIP-629), which implements a custom ArithSeries type that contains 4 values that represent a list. The length and index values are computed on demand based on 3 of the 4 numeric values by simple arithmetic. It does not compute and fill a complete list up front. Shimmering is thus prevented in many cases. See branch tip-636-tcl9 which (re)implements the ArithSeries using AbstractLists. Branch tip-636-tcl9-644 implements the in-core approach.

Other examples of possible Abstract Lists

It turns out that [dict] already implements special treatment within the foreach command in order to avoid shimmering of a dict to a list. It would be possible to avoid many, if not all, dict to list shimmering by using the Abstract List approach.

Other suggested examples:

Implementation

See branch tip-636-tcl9, and tip-636-tcl9-644

Note: this is work that germinated from TIP-225. Salvatore Sanfilippo and Miguel Sofer should be recognized for their contribution.

Impact

On entry to the various list operations, both in the command implementation and in the bytecode engine, an initial test is made to see if the Obj is an AbstractList. A test is already made to see if the Obj is a List before calling SetListFromAny. The impact should be minimal. Note also that a similar test already exists in some limited cases for the Dict type.

It is conceivable that the traditional List implementation be (re)implemented as an AbstractList as well, then the additional overhead is simplified and minimized.