Author: Jan Nijtmans <[email protected]> Author: Jan Nijtmans <[email protected]> State: Draft Type: Project Vote: Pending Created: 10-May-2019 Post-History: Discussions-To: Tcl Core list Keywords: Tcl Tcl-Version: 8.7 Tcl-Branch: utf-max
This TIP proposes being able to switch Tcl between Full Unicode mode
(TCL_UTF_MAX>3, almost compatible with Androwish) and current partial
Unicode mode (as far as TIP #389 goes, using
Tcl currently can be compiled in 3 different modes: using TCL_UTF_MAX=3, TCL_UTF_MAX=4 or TCL_UTF_MAX=6. The first 2 are actually equal now in Tcl 8.7 (since TIP #389). Using TCL_UTF_MAX=6 is actually overkill, since no utf-8 character consists of more than 4 bytes.
Therefore it makes sense to reduce this to only two modes: TCL_UTF_MAX=3 means being fully compatible with Tcl 8.6, while TCL_UTF_MAX=4 means compatibility with the Androwish-version of Tcl. Defining TCL_UTF_MAX=6 results in a valid compilation as well (functioning the same as TCL_UTF_MAX=4), only some buffer-sizes will be 2 bytes larger than necessary.
Androwish made the choice to use an (at that time) un-supported Tcl mode: Changing the size of the Tcl_UniChar type using TCL_UTF_MAX=6. This causes a binary incompatibility which results that all extensions need to be re-compiled with TCL_UTF_MAX=6 as well. This TIP proposes to add a supported TCL_UTF_MAX=4 compilation mode to Tcl, which has the same effect as the earlier unsupported TCL_UTF_MAX=6, but without the need to re-compile all extensions. The need for re-compilation of all extensions is eliminated by putting the 32-bit versions of the Tcl_UniChar-related functions in different stub entries than the 16-bit versions. This way, 99% of all extensions compiled with TCL_UTF_MAX=3 keep functioning as before without the need for re-compilation.
The default compilation mode for Tcl will continue to be TCL_UTF_MAX=3, which is 100% upwards compatible with Tcl 8.6.
This document proposes:
Allow Tcl to be compiled with either -DTCL_UTF_MAX=3 (default), or with -DTCL_UTF_MAX=4. In the latter mode, the Tcl_UniChar type becomes a 32-bit type, but the stub entries for the 16-bit Tcl_UniChar type are present as well. So, most extensions compiled with -DTCL_UTF_MAX=3 will continue to work in either Tcl mode (for caveats, see below).
Allow Tcl extensions to be compiled with either -DTCL_UTF_MAX=3 (default), or with -DTCL_UTF_MAX=4, when Tcl is compiled with -DTCL_UTF_MAX=4.
Deprecate the following functions:
If Tcl is compiled with either -DTCL_UTF_MAX=4 or -DTCL_NO_DEPRECATED, those functions will no longer be available for extensions, as well as in Tcl 9.0.
This function does almost the same as Tcl_UtfToUniChar(), but it writes 16-bit Unicode Characters ("unsigned short") independent of the value of -DTCL_UTF_MAX.
This function can be used if you want your extension to compile with either -DTCL_UTF_MAX=3 or -DTCL_UTF_MAX=4, but still want to use the 16-bit conversions independent on the TCL_UTF_MAX setting or Tcl_UniChar type.
As long as Tcl is compiled with -DTCL_UTF_MAX=3, this is fully upwards compatible.
When Tcl is compiled with -DTCL_UTF_MAX=4, this is at the Tcl level, compatible with the Androwish-version of Tcl. At the C-API level, it's upwards compatible with Tcl 8.6 in TCL_UTF_MAX=6 mode, except for the functions marked above as deprecated. Those functions will be gone.
Extensions compiled with -DTCL_UTF_MAX=4 cannot use any of the deprecated functions mentioned in this TIP. Using any of them results in a link error.
If Tcl is compiled with -DTCL_UTF_MAX=4, the deprecated functions will be gone. Any extension using those, even if the extesion is compiled with -DTCL_UTF_MAX=3, won't work any more.
A reference implementation is available in the utf-max branch. https://core.tcl-lang.org/tcl/timeline?r=utf-max
This document has been placed in the public domain.