Author: Brian Griffin <[email protected]>
Author: Ashok P. Nadkarni <[email protected]>
State: Draft
Type: Project
Vote: Pending
Created: 1-Sep-2022
Tcl-Version: 9.1
Tcl-Branch: tip-636-tcl9
- Abstract
- Specification
- TIP-644
- Requirements and Caveats
- Examples
- Implementation
- Alternatives
- Acknowledgment
Abstract
The most frustrating part of implementing custom Tcl_Obj types is that they often shimmer to a built-in Tcl type when performing typical Tcl operations on the value. This removes the advantages of having a custom type. The reason for having a custom type is usually due to either computational advantage, space advantage, or both. Maintaining this advantage is a worthy goal. Having access to built-in Tcl list operations makes the custom type even more worthwhile.
This TIP proposes to extend the Tcl_Obj type by adding functions that
implement List operations on the given value. The functions mimic the
List command behaviors without changing the Obj Type
representation. By providing these functions, some or all of the
script level List operations, such as lindex
, llength
, and
foreach
, would be supported for a custom type without causing the
obj value to shimmer to the List type.
Here is a simple example of how this works. A simple scalar value,
like a double numeric has an internal representation of
"double". However, if the value is passed to [llength]
command, the
type will change to a List type:
% set n [expr {15.3}]
15.3
% representation $n
value is a double with a refcount of 2, object pointer at
0x7fa3b512adc0, internal representation 0x402e99999999999a:0x0,
string representation "15.3"
% llength $n
1
% representation $n
value is a list with a refcount of 2, object pointer at
0x7fa3b512adc0, internal representation 0x7fa3b5864a50:0x0, string
representation "15.3"
If the "double" type adds a lengthProc function to the type definition, then the
[llength]
command will correctly return "1" without having to
shimmer the value to a "list".
Specification
This Tcl_ObjType is extended with slots for the following function pointers:
/* Abstract List functions */
Tcl_Size (*lengthProc)(obj)
int (*indexProc)(interp, obj, index, elemObjPtr)
int (*sliceProc)(interp, obj, from, to, newObjPtr)
int (*reverseProc)(interp, obj, newObjPtr)
int (*getElementsProc)(interp, obj, start, count, objcPtr, objvPtr)
int (*replaceProc)(interp, obj, first, numToDelete, numToInsert, insertObjs)
Tcl_Obj* (*setElementProc)(interp, listPtr, indicies, valueObj)
int (*getDblProc)(interp, obj, doublePtr)
These functions are optional and when absent, the List operation will
revert to the base List operation behavior. For example, if the
reverseProc is NULL, the [lreverse]
operation will first shimmer
the value to a List, then perform the operation.
The lengthProc is necessary if any other List function is also provided. The indexProc function is also used in some fallback operations.
The getDblProc function is not a List operation; it is used by
Tcl_GetDoubleFromObj()
. This optional function is intended only for
cases where an alternate numeric representation is used for the custom
type, and shimmering of this scalar value is undesired. (This is the
case in VecTcl, for example.)
Hopefully the definition of each of these functions is clear by there respecive names.
TIP-644
Tip-644 added a version field to the Tcl_ObjType structure. In that proposal it was suggested that additional fields could be added individually, or as a struct. I propose that any elements added be done so individually, and that the version be incremented when changes are made, in the same paradigm used by the Tcl_ChannelType. This would mean that for a given Tcl release version, there is only one allowed version of the Tcl_ObjType.
Using structs adds complexity that I am not sure is helpful. Using struct size for the version, as has been done in some branch implementations of Abstract Lists, is problematic when future changes may have the same size as older versions.
Requirements and Caveats
In all cases, the string representation for an Abstract List value
must always be a valid List as defined by the Tcl language syntax and
the [list]
command. An Abstract List will always successfully
"SetListFromAny" to a List type, memory limits aside. It is not a
requirement that a string representation of a list be recognized by a
particular Abstract List Type; there are some cases where this is not
reasonably possible. For example, the ArithSeries, used in the lseq
command, has only start
, step
, and count
stored in the internal
representation and uses math to compute indexed values. Taking a list
of number and computing the start, step, and count and then validate,
although possible, may not be worth the effort if vary large lists are
involved.
Examples
An existing example of this concept is the new lseq command (TIP-629), which implements a custom ArithSeries type that contains 4 values that represent a list. The length and index values are computed on demand based on 3 of the 4 numeric values by simple arithmetic. It does not compute and fill a complete list up front. Shimmering is thus prevented in many cases. See branch tip-636-tcl9 which (re)implements the ArithSeries using AbstractLists. Branch tip-636-tcl9-644 implements the in-core approach.
Other examples of possible Abstract Lists
It turns out that [dict] already implements special treatment within the foreach command in order to avoid shimmering of a dict to a list. It would be possible to avoid many, if not all, dict to list shimmering by using the Abstract List approach.
Other suggested examples:
- matrix values (there are several existing implementations)
- VecTcl has been ported to an AbstractList. See VecTcl9 on GitHub.
- bit-field arrays
- ByteArray
- strings -- each character is treaded as a list element.
- polynomial equations:
[lpoly 5 2 -3] -> y=c0*x^0+c1*x^1+c2*x^2
..., wherec0=5
,c1=2
,c3=-3
, ... - database:
foreach row [lselect $db {FROM * WHERE name LIKE "Griffin"}]
...
Implementation
See branch tip-636-tcl9, and tip-636-tcl9-644
Alternatives
There is an "internal only" implementation on branch internal-abstract-list
Impact
On entry to the various list operations, both in the command implementation and in the bytecode engine, an initial test is made to see if the Obj is an AbstractList. A test is already made to see if the Obj is a List before calling SetListFromAny. The impact should be minimal. Note also that a similar test already exists in some limited cases for the Dict type.
It is conceivable that the traditional List implementation be (re)implemented as an AbstractList as well, then the additional overhead is simplified and minimized.
Acknowledgment
Note: this is work that germinated from TIP-225. Salvatore Sanfilippo and Miguel Sofer should be recognized for their contribution.