Author: Don Porter <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 21-Mar-2018
Obsoletes: 475
Post-History:
Keywords: Tcl,string,insert
Tcl-Version: 8.7
Tcl-Branch: dgp-string-insert
Votes-For: DKF, JN, DGP, FV, SL, AK
Votes-Against: none
Votes-Present: none
Abstract
This TIP proposes a [string insert
] subcommand for inserting a substring at a
given index. This new [string insert
] command is to be the string analogue of
[linsert
].
History
[TIP 475] proposed the same subcommand, and in addition proposed a public C routine to access the same functionality. It was rejected, but only for reasons relating to the C routine. This TIP is a duplication of the rejected TIP, modified only to remove the C routine from the proposal, and with appropriate revisions to the sections describing the Reference Implementation and Future Work.
Rationale
Substring insertion is a basic string operation not directly available in current Tcl. Substring insertion can be synthesized from existing string commands, but the numerous legal forms of indexing yield significant difficulty. A novice user cannot be expected to know, much less implement, all possible index formats. Thus it is reasonable to provide a standard substring insertion command.
The current design of [string replace
] expressly (albeit inexplicably)
prevents its use for performing string insertion. TIP 323 originally proposed
to extend [string replace
] to allow string insertion, but this aspect of TIP
323 was withdrawn
for the sake of compatibility.
Clarification: TIP 504 proposes no changes to the semantics of [
string replace
].
To mirror the behavior of [linsert
], [string insert
] at end
should append
to the string. This is in conflict with [string replace
], were it to be
extended to permit replacing an empty substring.
[lreplace
] of an empty range at end
inserts elements immediately before the
final element, whereas appending requires the start index to be end+1
. The
hypothetical extended [string replace
] would mirror [lreplace
] and therefore
would not have the end-relative indexing semantics needed to implement [string
insert
].
In conclusion, it is most straightforward to simply provide a [string insert
]
command with the same semantics as [linsert
].
Current Behavior
Currently available methods for string insertion are awkward and do not handle end-relative and TIP 176-style indexing without extraordinary effort. To demonstrate the surprising degree of complexity, the following is a pure Tcl script reference implementation intended to handle all possible index formats and corner cases:
# Pure Tcl implementation of [string insert] command. proc ::tcl::string::insert {string index insertString} { # Convert end-relative and TIP 176 indexes to simple integers. if {[regexp -expanded { ^(end(?![\t\n\v\f\r ]) # "end" is never followed by whitespace |[\t\n\v\f\r ]*[+-]?\d+) # m, with optional leading whitespace (?:([+-]) # op, omitted when index is "end" ([+-]?\d+))? # n, omitted when index is "end" [\t\n\v\f\r ]*$ # optional whitespace (unless "end") } $index _ m op n]} { # Convert first index to an integer. switch $m { end {set index [string length $string]} default {scan $m %d index} } # Add or subtract second index, if provided. switch $op { + {set index [expr {$index + $n}]} - {set index [expr {$index - $n}]} } } elseif {![string is integer -strict $index]} { # Reject invalid indexes. return -code error "bad index \"$index\": must be\ integer?\[+-\]integer? or end?\[+-\]integer?" } # Concatenate the pre-insert, insertion, and post-insert strings. string cat [string range $string 0 [expr {$index - 1}]] $insertString\ [string range $string $index end] } # Bind [string insert] to [::tcl::string::insert]. namespace ensemble configure string -map [dict replace\ [namespace ensemble configure string -map]\ insert ::tcl::string::insert]
More sample implementations can be found on the Additional String Functions page of the Tcler's Wiki, but at time of
writing, they do not handle end-relative indexing nor can be used to append to a
string. Since they are implemented in terms of [string replace
] and do not
perform any index arithmetic of their own, they actually do support TIP 176
indexes.
Compatibility Considerations
The existence of a command named [string insert
] breaks any existing code that
assumes [string in
] is an unambiguous abbreviation for [string index
]. Two
options exist:
- Special-case [
string in
] to mean [string index
]. - Take no special action, in which case [
string in
] becomes an error.
This TIP proposes option #2 because abbreviations are not guaranteed to be stable in the long term. This TIP targets Tcl 8.7, the first alpha version of which was recently released. Thus there are three reasons why compatibility is deemphasized in this situation.
Specification
Add a new [string insert
] command:
string insert string index insertString
Returns a copy of string with insertString inserted at the index'th character. index may be specified as described in the STRING INDICES section.
If index is start-relative, the first character inserted in the returned string will be at the specified index. If index is end-relative, the last character inserted in the returned string will be at the specified index.
If index is at or before the start of string (e.g., index is 0), insertString is prepended to string. If index is at or after the end of string (e.g., index is end), insertString is appended to string.
Reference Implementation
A pure Tcl reference implementation is given above.
The dgp-string-insert
branch in the Tcl Fossil
repository provides an implementation of the proposed subcommand, complete
with documentation, bytecode compilation, and a set of test cases.
Future Work
The direct evaluation of [string insert
] is routed though a new
internal routine TclStringReplace
. It is a conventional substring
replacer routine that serves as the inner core of both the
[string insert
] and [string replace
] commands with suitable
screening of corner cases by the callers. There is one routine to
perform this function, so that there is one place to get the
functionality right (debugging), one place to work on performance
and representation efficiency (optimization), and one place where
we can experiment with transformation to different data structures.
This is one of a family of TclStringFoo
routines that are engines
of functionality for other [string foo
] subcommands.
The existing internal routine TclStringReplace
does not include
the full collection of optimizations that the prior routine
Tcl_ReplaceObj
did. Since this routine remains internal, it can
continue to gain these revisions without further TIP examination.
Likewise, alternative bytecode compiler and execution strategies may
also be pursued internally.
These routines may be good candidates to become available to applications
and extensions in the public C API. The current TIP does not propose that.
It is out of scope. A set of questions will need to be addressed when
considering converting these routines into public ones. First it will
need to be determined what level of robustness to present in a public
interface. Should such a function be permitted to fail or even abort
if given improper arguments? Or should all required argument validation
be built into the routines to detect such things and raise catch-able
errors instead? This general question includes the specific question
about whether such routines are permitted to panic when memory allocation
fails. Second several of these routines require an argument specifying
an index into a string. These arguments have type int
. It is expected
that string indices limited to int
will no longer be desirable in Tcl 9,
so it must be decided whether it makes sense to create new routines
for Tcl 8.7 that are destined to be discarded by Tcl 9, or whether a
migration path needs to be put in place from the beginning. Since these
questions are non-trivial, addressing them is saved for a later TIP.
Copyright
This document has been placed in the public domain.