Author: Jan Nijtmans <[email protected]>
Author: Jan Nijtmans <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 31-May-2019
Post-History:
Discussions-To: Tcl Core list
Keywords: Tcl
Tcl-Version: 8.7
Tcl-Branch: tip-547
Abstract
This TIP proposes to add more encodings for handling utf-16 and ucs-2.
Rationale
Currently, Tcl only has one multi-byte Utf encoding named "unicode". Depending on how Tcl is compiled, this could be 16-bit or 32-bit. If 16-bit, then it's currently not clear whether surrogates are handled or not. Also, those encodings always use the platform-endian mode. There is no way to force little- or big-endianess.
Therefore this TIP proposes to clear up the ambiguity: Make clear that those encodings are always 16-bit, and provide different encodings for little- and big-endian. The "utf-16" variant handles surrogates while the "ucs-2" variant does not.
Specification
This document proposes:
Add new encodings "utf-16", "utf-16le", "utf-16be", "ucs-2", "ucs-2le", "ucs-2be".
Deprecate the "unicode" encoding. "utf-16" is supposed to be used in stead. The "unicode" encoding will NOT be removed in Tcl 9.0, since it's too common.
Use case
Tk defines it's own "ucs-2be" encoding when compiled on little-endian machines. So, this TIP means that Tk no longer needs to provide this encoding any more.
Compatibility
This is fully upwards compatible, except when Tcl is compiled with -DTCL_UTF_MAX=6
(which is - actually - not supported).
Reference Implementation
A reference implementation is available in the tip-547 branch. https://core.tcl-lang.org/tcl/timeline?r=tip-547
Copyright
This document has been placed in the public domain.