TIP 714: encoding compatibility for GDI/HammerDB/TPC/DB2

Login
Author:         Jan Nijtmans <[email protected]>
State:          Draft
Type:           Project
Tcl-Version:    9.0.2
Tcl-Branch:     encoding-user

Abstract

Not all applications on Windows are able to make the move to utf-8. For example, using a GDI function in Tcl 9.0 doesn't work as expected:

    str = Tcl_GetStringFromObj(obj, &len);
    Tcl_UtfToExternalDString(NULL, str, len, &ds);
    TextOutA(pdlg.hDC, X0, Y0, Tcl_DStringValue(&ds), Tcl_DStringLength(&ds))
The problem is that TextOutA() doesn't respect the UTF-8 setting in the manifest.

The TclWinGetUserEncoding() function is proposed to bridge that gap. At script level encoding user (which is read-only) can be used to determine the Windows ACP encoding, independent from the UTF-8 setting in the manifest. In other words, encoding user gives the same answer as Tcl 8.6's encoding system gives.

TIP #716 gives a solution for the DB2 failure, where the DB2 dll cannot handle the utf-8 encoding: Remove the UTF-8 setting from the manifest. But it comes at a cost (described below).

This TIP proposes to provide two executables "tclsh90.exe" and "tclsh90c.exe" (where 'c' stands for 'compat'). "tclsh90.exe" is 100% compatible with Tcl 9.0.0/9.0.1, therefore allowing utf-8 everywhere (except for GDI and a few other API's, but that's what TclWinGetUserEncoding() is meant for). "tclsh90c.exe" has all limitations described in TIP #716, but allows custom dll's which cannot handle the utf-8 encoding. The cost is that extensions which use the ANSI interface, cannot depend on utf-8 any more: Characters outside the cp1252 set will not be handled as expected when used in an ANSI API.

Rationale

There are 3 ways application can translate UTF-8 to an external encoding, interfacing with Windows (or other) API.

  1. Use the Unicode API. Example:

    str = Tcl_GetStringFromObj(obj, &len);
    Tcl_WinUtfToTChar(str, len, &ds);
    FILE *_wfopen((WCHAR *)Tcl_DStringValue(&ds), L"r");
    

  2. Use the ANSI API. Example:

    str = Tcl_GetStringFromObj(obj, &len);
    Tcl_UtfToExternalDString(NULL, str, len, &ds);
    FILE *fopen(Tcl_DStringValue(&ds), "r");
    

  3. Don't care about characters outside the ASCII set. Example:

    str = Tcl_GetStringFromObj(obj, &len);
    FILE *f = fopen(pStr, "r");
    

Those 3 approaches all work fine in 9.0.0/9.0.1, for any Unicode character. In Tcl 8.6 they all work fine, as long as all characters are within the expected range (cp1252 for (2) and ASCII for (3)).

However, in Tcl 9, method 2) doesn't work as expected any more for the GDI API, when using characters >= 0x80, since GDI doesn't respect the utf-8 setting in the Tcl 9.0 manifest, contrary to other API's.

The TclWinGetUserEncoding() function can help here, making it work as in Tcl 8.6:

    str = Tcl_GetStringFromObj(obj, &len);
    Tcl_UtfToExternalDString(TclWinGetUserEncoding(interp), str, len, &ds);
    TextOutA(pdlg.hDC, X0, Y0, Tcl_DStringValue(&ds), Tcl_DStringLength(&ds))

Remark: The GDI interface also has Wide functions, which are preferred, so

Specification

Declare and implement a new function:

Example (derived from Harald's example):

    str = Tcl_GetStringFromObj(obj, &str);
    Tcl_UtfToExternalDString(TclWinGetUserEncoding(interp), str, len, &ds);
    TextOutA(pdlg.hDC, X0, Y0, Tcl_DStringValue(&ds), Tcl_DStringLength(&ds))

Also a new "tclsh90c.exe" will be built/installed, which is the same as "tclsh90.exe", only the UTF-8 setting is missing from the manifest. This executable can be used in case one of the DLL's loaded by Tcl cannot handle the "utf-8" encoding. The cost is that encoding errors (coming from handling utf-8 as if it is cp1252) could arise which never happen when using "tclsh90.exe".

Implementation

See the encoding-user branch.

Copyright

This document has been placed in the public domain.