TIP 649: Expose additional list functionality in the C API

Login
Author:		Ashok P. Nadkarni <[email protected]>
State:		Draft
Type:		Project
Vote:		Pending
Created:	14-Nov-2022
Tcl-Version:	9.1
Tcl-Branch:	tip-649
Keywords:	list

Abstract

Although not directly dependent on it, this TIP is not being implemented unless TIP 636 is passed.

The list implementation has functionality that is available at the script level but not in the C API. This TIP proposes to expose the same as a C API as well. The benefits include

Rationale

Efficiency of list manipulation

Many applications and extensions for Tcl as well as Tk use Tcl's C API to manipulate Tcl list values. For example, reversing or extracting a range of elements is accomplished through a combination of Tcl_ListObjGetElements/Tcl_NewListObj or a loop involving Tcl_ListObjIndex/Tcl_ListObjAppendElement. Given the internal representation of lists in Tcl 8.6, this was also how the Tcl core would implement these operations and thus there were no significant efficiency gains that would result from exposing the higher level operations as a C API.

This has changed with Tcl 9.0 where list internal representation may take one of several forms (see TIP's 625, 629, 636). For many of these, some list operations may be done more efficiently than allowed by the restrictive Tcl 8.6 list C API. For example, range and list reversal can now be faster and more memory efficient by orders of magnitude for large lists.

The script level list commands benefit from these and it makes sense to extend the same to C applications and extensions as well.

Efficiency of list access

While the above pertain to list manipulation, the list accessor functions Tcl_ListObjGetElements and Tcl_ListObjIndex also have their own issues arising from an implicit assumption about the list internals. Tcl_ListObjGetElements works well when the list internal representation stores elements as an array of Tcl_Obj * pointer values. Tcl_ListObjIndex assumes that the list implementation holds a reference to the returned Tcl_Obj * element so the caller does not need to do any explicit reference management with the returned pointer. Neither of these assumptions necessarily hold true with newer list representations. For example, the arithmetic series stores the list in compact (start, end, step) form and generates a Tcl_Obj only when required. However, because of Tcl_ListObjIndex semantics, the list implementation has to hold on to a reference to the returned element Tcl_Obj essentially shimmering to the equivalent Tcl_Obj * array at a cost in memory and efficiency.

For this reason, the TIP proposes an accessor function that is almost identical to Tcl_ListObjIndex but requires the caller to release the returned Tcl_Obj by calling Tcl_DecrRefCount. The list implementation does not then need to artificially hold on to a reference to the returned `Tcl_Obj** just in order to free it later.

As an aside, note that the above drawbacks do not apply to the Tcl core implementation itself, including implementation of script commands, as Brian has already taken care to compensate appropriately. This is not possible for extension writers as they do not have access to internal Tcl list type information.

Specification

The following functions will be added along with corresponding entries in the stubs table. (Names all subject to change)

int Tcl_ListObjRange(
    Tcl_Interp *interp, /* Interpreter for error messages */
    Tcl_Obj *listPtr,   /* Unshared list object */
    Tcl_Size first,     /* Index of first element of range */
    Tcl_Size last);     /* Index of last element of range */
int Tcl_ListObjReverse(
    Tcl_Interp *interp, /* Interpreter for error messages */
    Tcl_Obj *listPtr);  /* Unshared list object */
int Tcl_ListObjRepeat(
    Tcl_Interp *interp, /* Interpreter for error messages */
    Tcl_Obj *srcListPtr,  /* List object containing elements to repeat */
    Tcl_Size repeatCount, /* Number of times to repeat */
    Tcl_Obj **newListPtr) /* Where to store the new list */
int Tcl_ListObjSort(
    Tcl_Interp *interp, /* Interpreter for error messages */
    Tcl_Obj *listPtr,   /* Unshared list object */
    TBD);
int Tcl_ListObjIndexWithRef(
    Tcl_Interp *interp, /* Interpreter for error messages */
    Tcl_Obj *listPtr,   /* Shared or unshared list object */
    Tcl_Size index,     /* Index of element to retrieve */
    Tcl_Obj **elementObjPtr) /* Where to store the element. Caller
                                needs to free it */

All functions return TCL_OK on success and TCL_ERROR on failure. Functionality should be obvious.

Discussion

The functions could be slightly altered to return a new Tcl_Obj and allow listPtr to be a shared object. However, that would be inconsistent with the current Tcl_ListObjReplace definition.

Applications should be encouraged to use Tcl_ListObjIndexWithRef to avoid shimmering.

Implementation

Pending until TIP 636 passes.

Copyright

This document has been placed in the public domain.