TIP 425: Correct use of UTF-8 in Panic Callback (Windows only)

Login
Author:         Jan Nijtmans <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        17-Jul-2013
Post-History:   
Keywords:       Tcl,platform integration,i18n
Tcl-Version:    8.7
Tcl-Branch:     win-console-panic

Abstract

The default panic proc on Windows console applications writes the message in UTF-8 to stderr. Unfortunately, the Windows console normally does not have UTF-8 as code page but some single-byte code page like CP1252. When using characters outside the ASCII range, that does not give the expected output in the console. This TIP proposes to add a new Console panic proc to the stub library, and modify the Tcl_Main() macro to use it.

In addition, this TIP proposes to modify the default panic proc on Windows, starting with Tcl 9.0. Since we now have a way to select a stderr-based console panic proc for Windows, we can change the default panic proc to be UI-based, as it is for Tk. This means that - starting with Tcl 9.0, Tk doesn't have to do anything special any more to have a UI-based panic implementation.

Rationale

Many parts of Tcl use Unicode since Tcl 8.6: The command line handling, and the communication with all Win32 API functions. But the Panic proc has - so far - not been modified accordingly for Windows console applications, even though win32 has a suitable API to do so.

On Windows, there actually are two different panic procs, one for GUI applications and one for console applications, but external embedders don't have an API for deciding which one should be used other than provide their own. This TIP can finally do that: The call Tcl_SetPanicProc(Tcl_ConsolePanic) will initialize the Tcl subsystem for Console applications, while Tcl_SetPanicProc(NULL) will continue to use the default.

Making things worse, stderr is implemented by the C runtime, (msvcrt??.dll) but if a application is embedding or dynamically loading tcl.dll then the runtime of the embedder might be different from tcl.dll/tclsh.exe's runtime. The embedder providing the panic proc gives the highest chance that panic messages arrive in the same runtime as the embedder. For tclsh.exe this makes no difference.

Starting with VS2015, Microsoft introduced a new Universal CRT (UCRT), which defaults to the standard channels stdin/stdout/stderr to be linked in statically. This makes it possible to have different dll's (e.g. zlib.dll / tcl87.dll) occupy a different C runtime (such as msvcrt.dll), but still prevent conflicts in the standard channels. That's why it is best to put Tcl_ConsolePanic() in Tcl's stub library, which assures the function to be linked in statically into the tclsh executable, in stead of dynamically in tcl87.dll.

Proposed Change

A new function Tcl_ConsolePanic is added to the stub library on Windows, which can be installed by embedding application as panic proc. The full signature is:

EXTERN void Tcl_ConsolePanic( const char *format ...);

On other platforms than Windows, Tcl_ConsolePanic is a macro equivalent to NULL, on those platforms Tcl_SetPanicProc(Tcl_ConsolePanic) has the effect of resetting the panic proc to the platform's default.

This function is meant to be used for Win32 console applications, and can deliver the message in 3 possible ways

The function Tcl_ConsolePanic does not assume any locale, does not allocate memory, neither does it make any assumptions on the initialized state of Tcl. This makes Tcl_Panic work fine even in the final stage of a Tcl_Finalize() call. If a Win32 Unicode API is available for the desired output, Tcl_ConsolePanic will do at most an UTF-8 to Unicode conversion using the Win32 function MultiByteToWideChar().

The maximum number of (unicode) characters that is written is 26000, as that is the maximum that WriteConsoleW() can handle in a single call. See: https://connect.microsoft.com/VisualStudio/feedback/details/635230 If the message is longer than that, the string is truncated and three dots appended to it. If the message is not sent to a character device, the UTF-8 BOM is prepended.

The function is available from the stub library, in order to bring the responsibility for correct linking to the embedding application, in stead of Tcl. In case of tclsh.exe, this makes no difference.

In addition, starting with Tcl 9.0, the default panic proc is changed to direct the output to the UI in stead of stderr. Since the new Tcl_ConsolePanic is meant for stderr-related output, this means that Tk (or other applications embedding Tcl) don't have to do anything special to have a UI-based panic proc.

Another (minor) modification: the Cygwin panic proc is adapted to output the UTF-8 BOM to stderr first, as does the Windows panic proc, if stderr is not sent to a character device. This improves the interoperability between Cygwin and Windows.

Reference Implementation

A reference implementation is available in the win-console-panic branch. https://core.tcl-lang.org/tcl/timeline?r=win-console-panic

Copyright

This document has been placed in the public domain.