TIP 537: Enable 64-bit indexes in regexp matching

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Jan Nijtmans <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        7-April-2019
Post-History:   
Keywords:       Tcl
Tcl-Version:    9.0
Tcl-Branch:     regexp-api-64bit

Abstract

This TIP proposes to modify struct Tcl_RegExpInfo and struct Tcl_RegExpIndices, such that the fields indicating indexes change from type int to type size_t.

Rationale

This TIP should have been part of TIP #502 (Index Value Reform) and/or TIP #494 (More use of size_t in Tcl 9), but it was overlooked. Without changing this public API, regular expression indexes never can exceed 2GiB in value.

Specification and Documentation

Here are the new struct definitions:

typedef struct Tcl_RegExpInfo {
    size_t nsubs;
    Tcl_RegExpIndices *matches;
    size_t extendStart;
} Tcl_RegExpInfo;

typedef struct Tcl_RegExpIndices {
    size_t start;
    size_t end;
} Tcl_RegExpIndices;

Also a new macro TCL_INDEX_NONE will be provided, which is the value of the start and end fields when there is no match. This macro will be provided to 8.7 as well, but in Tcl 8.7 it will have the value (-1).

Implementation

An implementation of this TIP is present in the regexp-api-64bit branch.

Copyright

This document has been placed in the public domain.