Author: Brian Griffin <[email protected]>
State: Final
Type: Project
Vote: Done
Votes-Summary: 4/0/0
Votes-For: SL, DF, KW, BG
Votes-Against: none
Votes-Present: none
Created: 1-Sep-2022
Tcl-Version: 9.0
Tcl-Branch: tip-636-tcl9-644
- Abstract
- Rationale
- Specification
- Implementation Note On The Tcl_ObjType struct
- Incompatible Changes
- Requirements and Caveats
- Examples
- Implementation
- Impact
- Alternatives
- Acknowledgment
Abstract
Extend Tcl_ObjType to support native List operation on non-List value types. Functions provided by a custom ObjType will be use in place of the internal List operation, avoiding runtime ObjType conversion.
Rationale
The most frustrating part of implementing custom Tcl_Obj types is that they will be converted to a built-in Tcl type when performing typical Tcl operations on the value. This removes the advantages of having a custom type since the custom extension will have first convert the value back to its original form before it can be used. (This is referred to as "shimmering") The reason for having a custom type is usually due to either computational advantage, space advantage, or both. Maintaining this advantage is a worthy goal. Having access to built-in Tcl list operations makes the custom type even more worthwhile.
Specification
The Tcl_Obj type will be extended by adding functions that implement
List operations on the given value. The functions mimic the List
command behaviors without changing the Obj Type representation. By
providing these functions, some or all of the script level List
operations, such as lindex
, llength
, and foreach
, would be
supported for a custom type without causing the obj value to shimmer
to the List type.
Here is a simple example of how this works. A simple scalar value,
like a double numeric has an internal representation of
"double". However, if the value is passed to [llength]
command, the
type will change to a List type:
% set n [expr {15.3}]
15.3
% representation $n
value is a double with a refcount of 2, object pointer at
0x7fa3b512adc0, internal representation 0x402e99999999999a:0x0,
string representation "15.3"
% llength $n
1
% representation $n
value is a list with a refcount of 2, object pointer at
0x7fa3b512adc0, internal representation 0x7fa3b5864a50:0x0, string
representation "15.3"
If the "double" type adds a lengthProc function to the type definition, then the
[llength]
command will correctly return "1" without having to
shimmer the value to a "list".
This Tcl_ObjType is extended with slots for the following function pointers:
/* Abstract List functions */
Tcl_Size (*lengthProc)(obj)
int (*indexProc)(interp, obj, index, elemObjPtr)
int (*sliceProc)(interp, obj, from, to, newObjPtr)
int (*reverseProc)(interp, obj, newObjPtr)
int (*getElementsProc)(interp, obj, start, count, objcPtr, objvPtr)
int (*replaceProc)(interp, obj, first, numToDelete, numToInsert, insertObjs)
Tcl_Obj* (*setElementProc)(interp, listPtr, indicies, valueObj)
These functions are optional and when absent, the List operation will
revert to the base List operation behavior. For example, if the
reverseProc is NULL, the [lreverse]
operation will first shimmer
the value to a List, then perform the operation.
The lengthProc is necessary if any other List function is also provided. The indexProc function is also used in some fallback operations.
Hopefully the definition of each of these functions is clear by their respective names.
Implementation Note On The Tcl_ObjType struct
TIP-644 added a version field to the Tcl_ObjType
structure to allow
for extending the struct with new features. The implementation of
Abstract Lists makes use of this with the intent of making the interface
as simple to use and maintain as possible, both internally and
externally. The version field is defined as Tcl_Size which will be a
monotonically increasing integer, starting at 0, for each future
definition change.
The Seven function pointers added to the Tcl_ObjType give custom types control over the set of script and C API level List operations. Macros are provided for ease of use and backward compatibility. Existing custom ObjTypes can use the 'TCL_OBJTYPE_V0' macro for existing (pre 9.0) definitions. Any type that is not intended to be treated as a list, and wants to prevent converting to a list, e.g. Tcl_WideInt, can use the 'TCL_OBJTYPE_V1(a)' macro and provide an lengthProc that always returns
- An Abstract List implementation uses the 'TCL_OBJTYPE_V2(...)' macro, and defines at least 2 of the complete set of functions.
Incompatible Changes
One outcome of Abstract Lists is that an abstraction is no longer
required to retain a copy of a list element Tcl_Obj
. This is a
significant change in that the current List type always retains a copy
of a list element Tcl_Obj
, and consequently, most all calls to
retrieve an element fail to manage the reference count of the Tcl_Obj
,
implicitly relying on the List to manage it. An abstract list, on the
other hand, can generate elements on-the-fly, and may not need to retain
the Tcl_Obj
. This leaves the responsibility of Tcl_Obj
memory
management up to the caller. To be fair, this is always the case for
reference counted managed memory.
Extensions for Tcl and applications that embed Tcl, will need to check and add appropriate RefCount management to the implementation if necessary. This has been done for the Tcl core tip branch.
Here is an example of a code snippet that illustrates the problem (randomly
chosen from the tcl core.) Take note of the treatment of itemObj
below:
Tcl_ListObjLength(NULL, zshPtr->outData, &listLen);
if (count < 0) {
count = 0;
for (i=0; i<listLen; i++) {
Tcl_ListObjIndex(NULL, zshPtr->outData, i, &itemObj);
(void) Tcl_GetByteArrayFromObj(itemObj, &itemLen);
if (i == 0) {
count += itemLen - zshPtr->outPos;
} else {
count += itemLen;
}
}
}
The above code inspects itemObj
, but never checks or modifies the
refCount. This is ok for Lists prior to this TIP. For Abstract Lists,
itemObj
could be newly created and have a refCount of 0. Completely safe
code should traditionally be written as:
Tcl_ListObjLength(NULL, zshPtr->outData, &listLen);
if (count < 0) {
count = 0;
for (i=0; i<listLen; i++) {
Tcl_ListObjIndex(NULL, zshPtr->outData, i, &itemObj);
Tcl_IncrRefCount(itemObj);
(void) Tcl_GetByteArrayFromObj(itemObj, &itemLen);
if (i == 0) {
count += itemLen - zshPtr->outPos;
} else {
count += itemLen;
}
Tcl_DecrRefCount(itemObj);
}
}
For a traditional List, these 2 additional calls will not effect the results. For an abstract list, a new Obj will be appropriately free'd.
The Incr/Decr calls in this example introduce an inefficiency that justified making an implied assumption about List elements. Since Abstract Lists changes the basis of this assumption, a different approach to improve efficiency is needed. In these cases, deferring reference count updates can be employed. It is for this reason that as part of the Abstract List implementation, a new refCount function has been added for this scenario:
Tcl_BumpObj(objPtr)
It would be used like this:
Tcl_ListObjLength(NULL, zshPtr->outData, &listLen);
if (count < 0) {
count = 0;
for (i=0; i<listLen; i++) {
Tcl_ListObjIndex(NULL, zshPtr->outData, i, &itemObj);
(void) Tcl_GetByteArrayFromObj(itemObj, &itemLen);
if (i == 0) {
count += itemLen - zshPtr->outPos;
} else {
count += itemLen;
}
Tcl_BumpObj(itemObj); // Deferred refCount update
}
}
This function effectively executes:
Tcl_IncrRefCount(objPtr);
Tcl_DecrRefCount(objPtr);
But actually implemented as:
if (objPtr) {
if ((objPtr)->refCount == 0) {
Tcl_DecrRefCount(objPtr);
}
}
For more information on List associated refCount issues, see related TIP-192.
And there is this rabbit hole: minimizing reference count updates .
Requirements and Caveats
In all cases, the string representation for an Abstract List value
must always be a valid List as defined by the Tcl language syntax and
the [list]
command. An Abstract List will always successfully
"SetListFromAny" to a List type, memory limits aside. It is not,
however, a requirement that a string representation of a list be
recognized by a particular Abstract List Type; there are some cases
where this is not reasonably possible. For example, the ArithSeries,
used in the lseq
command, has only start
, step
, and count
stored in the internal representation and uses math to compute indexed
values. Taking a list of number and computing the start, step, and
count and then validate, although possible, may not be worth the
effort if vary large lists are involved.
Examples
An existing example of this concept is the new lseq command (TIP-629), which implements a custom ArithSeries type that contains 4 values that represent a list. The length and index values are computed on demand based on 3 of the 4 numeric values by simple arithmetic. It does not compute and fill a complete list up front. Shimmering is thus prevented in many cases. See branch tip-636-tcl9 which (re)implements the ArithSeries using AbstractLists. Branch tip-636-tcl9-644 implements the in-core approach.
Other examples of possible Abstract Lists
It turns out that [foreach]
command already implements special
treatment for the dict type in order to avoid shimmering of a
dict type value to a list type. It would be possible to avoid
many, if not all, dict to list shimmering by using the
Abstract List approach.
Other suggested examples:
- matrix values (there are several existing implementations)
- VecTcl has been ported to an AbstractList. See VecTcl9 on GitHub.
- There are a number of simple examples in
abstractlist-toys
on GitHub:
- lstring: each character is treated as a list element.
- polynomial equations:
[lpoly 5 2 -3] -> y=c0*x^0+c1*x^1+c2*x^2
..., wherec0=5
,c1=2
,c3=-3
, ... - readlines: reads in a file and splits into lines.
- lgen: see TIP-192 Although this example works in most
cases, there is an issue with in-line string use of
[lgen]
that I have not yet tracked down.
- bit-field arrays
- ByteArray
- Typed List: all elements of the list are the same type,
e.g.
linteger
, all elements are integers, with storage optimized for integers. Thelstring
example above is also an example of a Typed List. - database:
foreach row [lselect $db {FROM * WHERE name LIKE "Griffin"}]
...
Implementation
See branch tip-636-tcl9-644
Impact
On entry to the various list operations, both in the command implementation and in the bytecode engine, an initial test is made to see if the Obj is an AbstractList. A test is already made to see if the Obj is a List before calling SetListFromAny. The impact should be minimal. Note also that a similar test already exists in some limited cases for the Dict type.
It is conceivable that the traditional List implementation be (re)implemented as an AbstractList as well, then the additional overhead is simplified.
Alternatives
There is an "internal only" (partial) implementation on branch, and main (currently) internal-abstract-list
There is also a work-in-progress with a variation of abstract lists (or abstract types?) at pyk-objinterface.
Acknowledgment
Note: this is work that germinated from TIP-225. Salvatore Sanfilippo and Miguel Sofer should be recognized for their contribution.