Author: Jan Nijtmans <[email protected]> Author: Jan Nijtmans <[email protected]> State: Draft Type: Project Vote: Pending Created: 3-June-2019 Post-History: Discussions-To: Tcl Core list Keywords: Tcl Tcl-Version: 8.7 Tcl-Branch: tip-548
This TIP proposes to add
wchar_t conversion functions and deprecate
in favour of those new functions.
Tcl_WinTCharToUtf() originally were functions able to do two different
conversions, depending on the runtime platform: On Windows 95/98/ME they performed conversions between UTF-8 and
the Windows default encoding (usually CP1252), on later Windows versions they convert between UTF-8 and UTF-16.
The length parameter of
Tcl_WinTCharToUtf() always was in bytes, but most other Unicode-related Tcl functions
expect their length in Unicode characters.
Since Windows 95/98/ME are not supported any more, it's time to fix this inconsistency.
Modern systems have a
wchar_t type which represents a Unicode-like type, which can either be 16 bits
(unsigned short) or 32 bits (int). This TIP proposes 3 additional functions which convert between
wchar_t-related types and UTF-8, and 3 more which convert between
unsigned short-related types (UTF-16)
and UTF-8. The new functions work identically on all platforms, not only Windows.
This document proposes:
Tcl_UniCharToUtfDString()function such that the uniLength parameter is allowed to have the value -1. In that case, the UniChar string will be read up to the closing /u0000 character.
Tcl_UtfToUniCharDString()functions such that the src/uniStr parameters are allowed to have the value NULL. In that case, the functions return NULL, without doing anything.
Tcl_UtfToWCharDString(), similar to
Tcl_UtfToUniCharDString(), but has a
wchar_tpointer type in its signature.
Tcl_WCharToUtfDString(), similar to
Tcl_UniCharToUtfDString(), but has a
wchar_tpointer type in its signature
Tcl_UtfToWChar(), similar to
Tcl_UtfToUniChar(), but has a
wchar_tpointer in its signature.
wchar_tis the same type as
unsigned short, but on other platforms
wchar_tmight be a 32-bit type ('int' usually). These functions map to either the 32-bit or the 16-bit versions, depending on the size of
Tcl_UtfToChar16DString(), similar to
Tcl_UtfToUniCharDString(), but has an
unsigned shortpointer type in its signature.
Tcl_Char16ToUtfDString(), similar to
Tcl_UniCharToUtfDString(), but has an
unsigned shortpointer type in its signature
Tcl_UtfToChar16(), similar to
Tcl_UtfToUniChar(), but has an
unsigned shortpointer in its signature.
Deprecate the following functions:
Tcl_WinUtfToTChar(), in favour of
Tcl_WinTCharToUtf(), in favour of
If Tcl is compiled with either -DTCL_UTF_MAX=6 (which is not officially supported) or -DTCL_NO_DEPRECATED, those functions will become macro's, which do exactly the same thing. In Tcl 9.0, the 2 deprecated functions will be removed from the stub tables, but the replacement macro's will still be there. So, the functions can still be used in extensions, they will be replaced with the new functions automatically.
- The windows dde and registry extensions (tclWinDde.c and tclWinReg.c) are updated to use the new functions
Tcl_WCharToUtfDString(), serving as proof/demonstration that the functions in this TIP actually work.
How to upgrade.
No need to do anything. In Tcl 9.0, the two deprecated functions are replaced by macro's which do the same thing. But if you want to prevent a (future) deprecation warning, you can do the following:
In your extension, replace the call:
with the following two lines:
`Tcl_DStringInit(bufPtr);` `Tcl_UtfToWCharDString(....., bufPtr);`
And also, replace:
with the following two lines:
`Tcl_DStringInit(bufPtr);` `Tcl_WCharToUtfDString(....., bufPtr);`
Tcl_WinTCharToUtf() call originally had a "length" parameter not equal to -1, divide it by 2 (or ... don't multiply it by 2 any more).
This is fully upwards compatible.
A reference implementation is available in the tip-548 branch. https://core.tcl-lang.org/tcl/timeline?r=tip-548
This document has been placed in the public domain.