Author: Jan Nijtmans <[email protected]> State: Final Type: Project Vote: Done Vote-Summary: Accepted 9/0/0 Votes-For: BG, DGP, DKF, FV, JN, KBK, KW, MC, SL Votes-Against: none Votes-Present: none Created: 20-Sept-2020 Post-History: Keywords: Tcl source Tcl-Version: 9.0 Tcl-Branch: tip-587
This TIP proposes to make the default encoding for the "source" command utf-8.
The "utf-8" encoding is the most universal encoding available, more and more systems use it as system encoding. Introducing this change means that Tcl-9 (or even 8.7) scripts can start using any Unicode character without needing to use the escaped \u???? or \U?????? forms.
Starting with TIP #389 (Tcl 8.7), invalid bytes 0x80 up to 0x9F are interpreted as the cp1252 characters € up to Ÿ. This means that any script which was written in the "cp1252" encoding (most common on Windows) will get the expected outcome using Tcl's "utf-8" decoder. Since "cp1252" is a superset of "iso8859-1" (most common on older UNIX'es), the same holds for "iso8859-1".
Tcl's current "utf-8" decoder strips the first BOM (Byte Order Mark) from the stream. This means that even scripts using the Windows BOM-prepended "utf-8" files (not recommended, but still can be produced by Notepad) will work fine.
If Tcl is not fully initialized (so just
it doesn't get a chance to determine the system encoding from the environment.
On Windows, it falls back to "cp1252" in that case, on MacOS to "utf-8" and
on other systems to "iso8859-1". It's unlikely really being noticed, but this
TIP proposes to change that to "utf-8" in all cases. Since this is a
potential incompatibility for external applications embedding Tcl, this
change should be made for Tcl 9.0 only.
-encodingwill assume "utf-8" as default in stead of the system encoding.
- Scripts started from the "tclsh" commandline without
-encodingwill read the file using Tcl's "utf-8" decoder.
- The fallback encoding for embedded build information (TIP #59) will change from "cp1252" (windows)/"iso-8859-1"(UNIX) to "utf-8".
opencommand and other channel functionalities are kept as they are now.
- The fallback for the system encoding will change to "utf-8" on all platforms. Tcl 9.0 only.
This TIP is basically for 9.0. But all changes can be made on 8.7 with almost no risk, except for the last one. Therefore, there will be separate votes for bringing this to 9.0 and bringing it (except for the last point) to 8.7 too.
For scripts using ASCII characters only (as was the only portable way to write scripts in Tcl 8.x) this is fully upwards compatible. Scripts assuming a different system encoding than "ascii", "utf-8", "cp1252" or "iso-8859-1" will break. On MacOS and most modern UNIX-like systems (which already have "utf-8" as system encoding) it's 100% compatible.
A Tk modified demo showing how this eliminates the need to use Unicode
escape sequences is here:
(This demo works with Tk 8.6 too, even without this TIP, since the "widget"
demo is adapted to source all other demos using
-encoding utf-8 explicitly)
This document has been placed in the public domain.