Author: Jan Nijtmans <[email protected]> Author: Jan Nijtmans <[email protected]> State: Final Type: Project Vote: Done Created: 31-May-2019 Post-History: Discussions-To: Tcl Core list Keywords: Tcl Tcl-Version: 8.7 Tcl-Branch: tip-547
This TIP proposes to add more encodings for handling utf-16 and ucs-2.
Currently, Tcl only has one multi-byte Utf encoding named "unicode". Depending on how Tcl is compiled, this could be 16-bit or 32-bit. If 16-bit, then it's currently not clear whether surrogates are handled or not. Also, those encodings always use the platform-endian mode. There is no way to force little- or big-endianess.
Therefore this TIP proposes to clear up the ambiguity: Make clear that those encodings are always 16-bit, and provide different encodings for little- and big-endian. The "utf-16" variant handles surrogates while the "ucs-2" variant does not.
This document proposes:
Add new encodings "utf-16", "utf-16le", "utf-16be", "ucs-2", "ucs-2le", "ucs-2be".
Deprecate the "unicode" encoding. "utf-16" is supposed to be used in stead. The "unicode" encoding will NOT be removed in Tcl 9.0, since it's too common.
Tk defines it's own "ucs-2be" encoding when compiled on little-endian machines. So, this TIP means that Tk no longer needs to provide this encoding any more.
This is fully upwards compatible, except when Tcl is compiled with
-DTCL_UTF_MAX=6 (which is - actually - not supported).
A reference implementation is available in the tip-547 branch. https://core.tcl-lang.org/tcl/timeline?r=tip-547
This document has been placed in the public domain.