# TIP 425: Correct use of UTF-8 in Panic Callback (Windows only)
Author: Jan Nijtmans <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 17-Jul-2013
Post-History:
Keywords: Tcl,platform integration,i18n
Tcl-Version: 8.7
Tcl-Branch: win-console-panic
-----
# Abstract
The default panic proc on Windows console applications writes the
message in UTF-8 to stderr. Unfortunately, the Windows console
normally does not have UTF-8 as code page but some single-byte
code page like CP1252. When using characters outside the ASCII
range, that does not give the expected output in the console.
This TIP proposes to add a new Console panic proc to the stub
library, and modify the Tcl\_Main\(\) macro to use it.
In addition, this TIP proposes to modify the default panic
proc on Windows, starting with Tcl 9.0. Since we now have
a way to select a stderr-based console panic proc for
Windows, we can change the default panic proc to be
UI-based, as it is for Tk. This means that - starting with
Tcl 9.0, Tk doesn't have to do anything special any more
to have a UI-based panic implementation.
# Rationale
Many parts of Tcl use Unicode since Tcl 8.6: The command
line handling, and the communication with all Win32 API functions.
But the Panic proc has - so far - not been modified accordingly
for Windows console applications, even though win32 has a
suitable API to do so.
On Windows, there actually are two different panic procs,
one for GUI applications and one for console applications, but
external embedders don't have an API for deciding which one
should be used other than provide their own. This TIP can
finally do that: The call
_Tcl\_SetPanicProc\(Tcl\_ConsolePanic\)_ will initialize the
Tcl subsystem for Console applications, while
_Tcl\_SetPanicProc\(NULL\)_ will continue to use the default.
Making things worse, stderr is implemented by the C runtime,
\(msvcrt??.dll\) but if a application is embedding or dynamically
loading tcl.dll then the runtime of the embedder might be
different from tcl.dll/tclsh.exe's runtime. The embedder
providing the panic proc gives the highest chance that panic
messages arrive in the same runtime as the embedder.
For tclsh.exe this makes no difference.
Starting with VS2015, Microsoft introduced a new [Universal CRT (UCRT)](https://msdn.microsoft.com/en-us/library/abx4dbyh.aspx),
which defaults to the standard channels stdin/stdout/stderr
to be linked in statically. This makes it possible to
have different dll's (e.g. zlib.dll / tcl87.dll) occupy
a different C runtime (such as msvcrt.dll), but still
prevent conflicts in the standard channels. That's
why it is best to put Tcl\_ConsolePanic\(\) in Tcl's stub
library, which assures the function to be linked in statically
into the tclsh executable, in stead of dynamically in tcl87.dll.
# Proposed Change
A new function _Tcl\_ConsolePanic_ is added to the stub library
on Windows, which can be installed by embedding
application as panic proc. The full signature is:
> EXTERN void
**Tcl\_ConsolePanic**\(
const char \*_format_
...\);
On other platforms than Windows, _Tcl\_ConsolePanic_
is a macro equivalent to NULL, on those platforms
Tcl\_SetPanicProc\(Tcl\_ConsolePanic\) has the effect of resetting
the panic proc to the platform's default.
This function is meant to be used for Win32 console
applications, and can deliver the message in 3 possible ways
* If a \(Windows\) debugger is running, the message is sent there.
* If stderr is connected to a \(Windows\) console, the message is
sent there.
* Otherwise, the UTF-8 BOM \(3 bytes\) is written followed by
the unmodified message \(assumed to be in UTF-8\).
The function _Tcl\_ConsolePanic_ does not assume any locale,
does not allocate memory, neither does it make any assumptions
on the initialized state of Tcl. This makes Tcl\_Panic work fine
even in the final stage of a Tcl\_Finalize\(\) call.
If a Win32 Unicode API is available for the desired output,
_Tcl\_ConsolePanic_ will do at most an UTF-8 to Unicode
conversion using the Win32 function MultiByteToWideChar\(\).
The maximum number of \(unicode\) characters that is
written is 26000, as that is the maximum that
WriteConsoleW\(\) can handle in a single call. See:
<https://connect.microsoft.com/VisualStudio/feedback/details/635230>
If the message is longer than that, the string is
truncated and three dots appended to it. If
the message is not sent to a character device, the
UTF-8 BOM is prepended.
The function is available from the stub library, in
order to bring the responsibility for correct linking
to the embedding application, in stead of Tcl. In
case of tclsh.exe, this makes no difference.
In addition, starting with Tcl 9.0, the default
panic proc is changed to direct the output to the
UI in stead of stderr. Since the new _Tcl\_ConsolePanic_
is meant for stderr-related output, this means
that Tk (or other applications embedding Tcl) don't
have to do anything special to have a UI-based
panic proc.
Another \(minor\) modification: the Cygwin panic proc is
adapted to output the UTF-8 BOM to stderr first,
as does the Windows panic proc, if stderr is not sent to a
character device. This improves the interoperability
between Cygwin and Windows.
# Reference Implementation
A reference implementation is available in the **win-console-panic** branch.
<https://core.tcl.tk/tcl/timeline?r=win-console-panic>
# Copyright
This document has been placed in the public domain.