Author: Jan Nijtmans <[email protected]> State: Draft Type: Project Tcl-Version: 8.7 Tcl-Branch: tip-575
This TIP is a successor to TIP #542, resolving a corner-case not realized at that moment.
TIP #542 allows stub-enabled extensions to be compiled with
working with Tcl compiled with
-DTCL_UTF_MAX=3. All functions (e.g.
Tcl_UniChar, change behavior with the value of
then, supplying a 4-byte UTF-8 character to this function will return 1, and
point to a high surrogate. If
TCL_UTF_MAX=4, then, this function will return 4, and
will point to the full Unicode character. This works by supplying two different stub entries
and making a switch controlled by the value of
Tcl_UtfNext()/Tcl_UtfPrev() don't have a
Tcl_UniChar parameter, still there's
an expected coupling with the function
TCL_UTF_MAX=4 then we would
Tcl_UtfNext() to be able to jump forward 4 bytes, while with
Tcl_UtfNext() can only jump forward with maximum 3 bytes. The same for
Tcl_UtfCharComplete(), which is coupled with the function
(indicating if there are enough bytes available for
Tcl_UtfToUniChar() to be called),
has the same problem. Making this function switchable has the advantage that this function
now can be used to protect calls to
Tcl_UtfNext() too, for extensions compiled with
whatever value of
Implement new functions
Tcl_UtfPrev(), which can
jump 4 bytes forward resp. back, so it is possible to jump over UTF-8 characters > U+FFFF
in one step in stead of two.
Tcl_UtfPrev() functions will get their own new entries in the
stub table. So, extensions (however rare) using
compiled against Tcl 8.6 headers will keep their original behavior.
Tcl_UtfCharComplete() will behave almost identical to the old one. The only
difference is when it encounters a starting byte between 0xF0 and 0xF5: Then it will return
true only when at least 4 bytes are available.
If an extension is compiled with
-DTCL_UTF_MAX=4 or with
Tcl_UtfCharComplete() will start behaving like described in this TIP, if not then it
will behave exactly as in Tcl 8.6.
Tcl_UtfCharComplete() is adapted, stating that this function
can now be used to protect
Tcl_UtfNext() calls too.
Implementation is in branch tip-575
As long as Tcl and/or extensions are both compiled with
-DTCL_UTF_MAX=3 (which is
the default in Tcl 8.x) or
-DTCL_UTF_MAX=4 (as in Tcl 9.x), nothing changes.
The difference can only be noted in extensions which are compiled using a different
TCL_UTF_MAX value than Tcl.
This document has been placed in the public domain.