TIP 720: Updated Tcl Bytecode Opcodes

Login
Author:         Donal Fellows <[email protected]>
State:          Voting
Type:           Project
Created:        09-May-2025
Tcl-Version:    9.1
Tcl-Branch:     no-variable-width-instruction-issue

Abstract

This TIP proposes to change the set of bytecodes used in the Tcl bytecode engine. The primary goal of this is simplification to make the compiler easier to maintain.

Rationale

Tcl bytecode is complex to issue, and quirky in quite a few places. Chief among those are:

Additionally, the instruction sequences for some commands (especially try) can be very complex. We should simplify.

Specification

Except where noted below, the tcl::unsupported::assemble command is already transparently aware of these changes.

Deprecations of old opcodes

This TIP proposes to deprecate these opcodes:

Where known to be supported by the compiler, these elements of the TclInstruction enumeration will be marked with the deprecated attribute so that uses of them will result in warnings. If REMOVE_DEPRECATED_OPCODES is defined during compilation, they will be entirely elided including their bytecode engine implementations (resulting in a bytecode engine that cannot have bytecodes for Tcl 9.0 loaded into it, a non-issue without the use of tbcload).

Renaming of opcodes

The following opcodes are renamed (with no other change to them):

New replacement opcodes

These replace existing, similarly-named, opcodes with versions with wider operands.

Completely new opcodes

Note: INST_JUMP_TABLE_NUM introduces a new aux data type, where the internal model is a hash table with TCL_ONE_WORD_KEYS that maps Tcl_Size to Tcl_Size.

New internal types

There's a number of new internal types in tclCompile.h. The main ones of interest are:

New instruction issuing macros

Note that some of these were previously used in just one file. For their usage and exact definitions, see the code!

// Issue an instruction without an argument.
#define OP(name)
// Issue an instruction with a single-byte argument.
#define OP1(name,val)
// Issue an instruction with a four-byte argument.
#define OP4(name,val)
// Issue an instruction with a single-byte argument and a four-byte argument.
#define OP14(name,val1,val2)
// Issue an instruction with two four-byte arguments.
#define OP44(name,val1,val2)
// Issue an instruction with a foun-byte argument and a single-byte argument.
#define OP41(name,val1,val2)
// Issue a potentially break/continue generating instruction without an argument.
#define INVOKE(name)
// Issue a potentially break/continue generating instruction with a single argument.
#define INVOKE4(name,arg1)
// Issue a potentially break/continue generating instruction with two arguments.
#define INVOKE41(name,arg1,arg2)
// Push a string literal.
#define PUSH(string)
// Push a string whose is computed with strlen().
#define PUSH_STRING(strVar)
// Push a string from a TCL_TOKEN_SIMPLE_WORD token.
#define PUSH_SIMPLE_TOKEN(tokenPtr)
// Take a reference to a Tcl_Obj and arrange for it to be pushed.
#define PUSH_OBJ(objPtr)
// Take a reference to a Tcl_Obj and arrange for it to be pushed.
// Handles extra flags, typically used for command names.
#define PUSH_OBJ_FLAGS(objPtr, flags)
// Push a general token. Needs which index of its command it is.
#define PUSH_TOKEN(tokenPtr, index)
// Push a token that is an expression.
#define PUSH_EXPR_TOKEN(tokenPtr, index)
// Compile the body of a command (e.g., [if], [while])
#define BODY(tokenPtr, index)
// Set the label to the current address. Typically paired with BACKJUMP.
#define BACKLABEL(var)
// Jump (of given type) backwards to the label defined by BACKLABEL.
#define BACKJUMP(name, var)
// Jump (of given type) forwards to the label defined by FWDLABEL.
#define FWDJUMP(name, var)
// Set the label to the current address. MUST be paired with FWDJUMP.
#define FWDLABEL(var)
// Create an unplaced CATCH exception range.
#define MAKE_CATCH_RANGE()
// Create an unplaced LOOP exception range.
#define MAKE_LOOP_RANGE()
// Wrap the given range around a body of code, placing its start and end.
#define CATCH_RANGE(range)
// Define where caught exceptions in the CATCH range branch to.
#define CATCH_TARGET(range)
// Define where caught BREAKs in the LOOP range branch to.
#define BREAK_TARGET(range)
// Define where caught CONTINUEs in the LOOP range branch to.
#define CONTINUE_TARGET(range)
// Finalize the LOOP exception range, setting the destinations for jumps.
#define FINALIZE_LOOP(range)
// Apply a correction to the stack depth.
#define STKDELTA(delta)

New macros in tclCompile.c

To keep things clearer and less prone to errors, the following macros are used for building the entries in the tclInstructionTable global constant:

#define TCL_INSTRUCTION_ENTRY(name,stack) \
    {name,1,stack,0,{OPERAND_NONE,OPERAND_NONE}}
#define TCL_INSTRUCTION_ENTRY1(name,size,stack,type1) \
    {name,size,stack,1,{type1,OPERAND_NONE}}
#define TCL_INSTRUCTION_ENTRY2(name,size,stack,type1,type2) \
    {name,size,stack,2,{type1,type2}}

These have no effect other than to make building the entries a bit less error-prone. (There's equivalent DEPRECATED_... ones for the deprecated opcodes, but they're currently otherwise defined identically; they're just visual markers when reading the source code.)

New bytecode engine macros

Mostly the changes here are small, but there is one new general macro:

That's a cut-down version of NEXT_INST_F() for the case when there's no result to handle, which is really quite common and means we can omit quite a bit of code that the compiler would otherwise have to work at to remove. If we're lucky, it just makes the bytecode engine faster to compile. If we're unlucky, it shrinks the size of the built code (due to removal of code that should have been unreachable).

Compatibility

There are no changes to the public Tcl C API. All API changes are strictly internal only.

If REMOVE_DEPRECATED_OPCODES is not defined, full compatibility with Tcl 9.0 is maintained, though possibly with warnings. Code that must handle the old opcodes, such as the bytecode engine, does:

#define ALLOW_DEPRECATED_OPCODES

prior to #include "tclCompile.h to disable the warnings.

Code that saves and loads bytecodes is not expected to be able to handle these new opcodes without changes; the new auxiliary record type causes that.

To Tcl scripts, there should be no visible changes, other than the lifting of some limits and new opcodes in tcl::unsupported::assemble.

Performance

The purpose of this change was to improve my sanity when reading the bytecode compilation code! However, a simple evaluation of the performance seems to indicate no substantive performance difference, and some increase in size of bytecode (to be expected as many common operations are now always issued with 4-byte operands). This is in line with expectations.

Implementation

See the no-variable-width-instruction-issue

Future Directions

NB: These all lie outside the scope of this TIP.

This TIP lays the groundwork for making more commands be bytecode compiled with expansion present, though more opcodes are likely to be required for much of that project.

There are several proposed routes for removing the deprecated bytecodes:

  1. Do not remove the existing bytecode implementations for now.
  2. Branch remove-deprecated-opcodes-level1 removes the implementations but leaves the other bytecodes as they are. This is compatible with existing code so long as the deprecated bytecodes are not used.
  3. Branch remove-deprecated-opcodes-level2 compacts the remaining bytecodes. This is definitely not compatible with existing bytecodes... but that only matters for code that uses the TDK compiler and tebcload.

Other things examined during the development of this TIP:

Copyright

This document has been placed in the public domain.