Author: Jan Nijtmans <[email protected]> State: Draft Type: Project Tcl-Version: 8.7 Tcl-Branch: tip-575
In the original TIP #389 implementation
Tcl_UtfPrev() were able to jump 4 bytes forth and back. This
is contrary to the common expectation and history when
Tcl_UtfPrev() where only able to jump
TCL_UTF_MAX bytes. Even though his was never documented, but - given other functions handling
a logical expectation. However, it is problematic to allow
Tcl_UtfPrev() to jump to a byte
within a valid 4-byte UTF-8 byte sequence: It means that the pointer points to a continuation byte, which
the caller could interpret as an invalid byte sequence.
Implement new functions
Tcl_UtfPrev(), which can jump 4 bytes forward resp. back,
so it is possible to jump over characters > U+FFFF in one step in stead of two.
Those 3 functions will get their own new entries in the stub table. So, extensions (however rare) using
Tcl_UtfPrev() but compiled against Tcl 8.6 headers will keep their original behavior.
Tcl_UtfCharComplete() will behave almost identical to the old one. The only difference is when it encounters
a starting byte between 0xF0 and 0xF5: It will return true only when at least 4 bytes are available.
If an extension is compiled with
-DTCL_UTF_MAX=4 or with
start behaving like described in this TIP, if not then it will behave exactly as in Tcl 8.6.
TODO: Describe whatever results from this experiment.
Implementation is in development (highly expremental) in branch tip-575
This document has been placed in the public domain.