Author: Jan Nijtmans <[email protected]>
State: Draft
Type: Project
Tcl-Version: 9.0.2
Tcl-Branch: encoding-user
Abstract
Not all applications on Windows are able to make the move to utf-8. For example, using a GDI function in Tcl 9.0 doesn't work as expected:
str = Tcl_GetStringFromObj(obj, &len); Tcl_UtfToExternalDString(NULL, str, len, &ds); TextOutA(pdlg.hDC, X0, Y0, Tcl_DStringValue(&ds), Tcl_DStringLength(&ds))The problem is that
TextOutA()
doesn't respect the UTF-8 setting in the manifest.
The TclWinGetUserEncoding()
function is proposed to bridge that gap. At script
level encoding user
(which is read-only) can be used to determine the Windows ACP
encoding, independent from the UTF-8 setting in the manifest. In other words,
encoding user
gives the same answer as Tcl 8.6's encoding system
gives.
TIP #716 gives a solution for the DB2 failure, where the DB2 dll cannot handle the utf-8 encoding: Remove the UTF-8 setting from the manifest. But it comes at a cost (described below).
This TIP proposes to provide two executables "tclsh90.exe" and "tclsh90c.exe" (where 'c' stands for 'compat'). "tclsh90.exe" is 100% compatible with Tcl 9.0.0/9.0.1, therefore allowing utf-8 everywhere (except for GDI and a few other API's, but that's what TclWinGetUserEncoding() is meant for). "tclsh90c.exe" has all limitations described in TIP #716, but allows custom dll's which cannot handle the utf-8 encoding. The cost is that extensions which use the ANSI interface, cannot depend on utf-8 any more: Characters outside the cp1252 set will not be handled as expected when used in an ANSI API.
Rationale
There are 3 ways application can translate UTF-8 to an external encoding, interfacing with Windows (or other) API.
Use the Unicode API. Example:
str = Tcl_GetStringFromObj(obj, &len); Tcl_WinUtfToTChar(str, len, &ds); FILE *_wfopen((WCHAR *)Tcl_DStringValue(&ds), L"r");
Use the ANSI API. Example:
str = Tcl_GetStringFromObj(obj, &len); Tcl_UtfToExternalDString(NULL, str, len, &ds); FILE *fopen(Tcl_DStringValue(&ds), "r");
Don't care about characters outside the ASCII set. Example:
str = Tcl_GetStringFromObj(obj, &len); FILE *f = fopen(pStr, "r");
Those 3 approaches all work fine in 9.0.0/9.0.1, for any Unicode character. In Tcl 8.6 they all work fine, as long as all characters are within the expected range (cp1252 for (2) and ASCII for (3)).
However, in Tcl 9, method 2) doesn't work as expected any more for the GDI API,
when using characters >= 0x80, since GDI doesn't respect the utf-8
setting in
the Tcl 9.0 manifest, contrary to other API's.
The TclWinGetUserEncoding()
function can help here, making it work as in Tcl 8.6:
str = Tcl_GetStringFromObj(obj, &len); Tcl_UtfToExternalDString(TclWinGetUserEncoding(interp), str, len, &ds); TextOutA(pdlg.hDC, X0, Y0, Tcl_DStringValue(&ds), Tcl_DStringLength(&ds))
Remark: The GDI interface also has Wide functions, which are preferred, so
- actually - this is a bad example. But I wouldn't be surprised if there were ANSI API's available which don't have a corresponding Unicode variant (possibly by providers other than Microsoft).
Specification
Declare and implement a new function:
- Tcl_Encoding TclWinGetUserEncoding(Tcl_Interp *interp)
Example (derived from Harald's example):
str = Tcl_GetStringFromObj(obj, &str); Tcl_UtfToExternalDString(TclWinGetUserEncoding(interp), str, len, &ds); TextOutA(pdlg.hDC, X0, Y0, Tcl_DStringValue(&ds), Tcl_DStringLength(&ds))
Also a new "tclsh90c.exe" will be built/installed, which is the same as "tclsh90.exe", only the UTF-8 setting is missing from the manifest. This executable can be used in case one of the DLL's loaded by Tcl cannot handle the "utf-8" encoding. The cost is that encoding errors (coming from handling utf-8 as if it is cp1252) could arise which never happen when using "tclsh90.exe".
Implementation
See the encoding-user branch.
Copyright
This document has been placed in the public domain.