Tcl Source Code

Migrating scripts to Tcl 9
Login

Work in progress... Currently focused on incompatibilities. Possibly hints on improving scripts making use of new facilities could be added at some point

Strict handling of ill-formed encoded data

Conceptually, Tcl strings are a sequence of Unicode code points. These strings are converted to and from external encodings via channel I/O or explicit use of the encoding command. The behavior of Tcl has changed in Tcl 9 in the presence of ill-formed encoded data in input streams or if output streams encodings do not support the Unicode code point being written.

In the case of input channels, Tcl 8 would silently map invalid bytes in the stream to the numerically equivalent Unicode code points, in essence presuming iso8859-1 encoding. On output streams, characters not supported by the output encoding were substituted with an encoding-specific replacement character.

In contrast, channels in Tcl 9 are by default configured with the strict encoding profile and will raise an error exception for both the above cases. If Tcl 8 behavior is desired, applications can use the -profile option to fconfigure to apply the tcl8 encoding profile to the channel. This is not recommended as it is not standards-compliant and can be seen as corrupting data. Instead, a standards-compliant alternative is to use the replace profile which like the tcl8 profile will also not raise errors in the presence of invalidly encoded data.

The same behavioural changes also apply to explicit transforms with the encoding command. Ill formed data or unsupported characters will result in an error exception. The -profile option can be used in this case as well to change this behavior.

For more on encoding profiles, refer to the encoding manpage.

See TIP 657.

Tilde substitution

File path arguments in Tcl 8 would be subject to tilde substitution where ~ and ~USER prefixes in the path would be replaced with the home directory of the current or named user respectively. Although convenient for interactive use, this behavior lead to both security and robustness issues as documented in TIP 602.

Tcl 9 therefore no longer does this substitution and scripts must explicitly do the same if desired with the new commands file home and file tildeexpand. Conversely, commands that had to protect against this treatment of ~ by prefixing with ./ (glob, file split etc.) will no longer to do so. Scripts that had their own workarounds to protect against tilde substitution no longer need to do so.

As an exception for convenience for end users, tilde substitution is still done at start up time on environment values containing library paths (TCLLIBPATH and analogous tm path variants).

See TIP 602.

Octals

The Tcl 8 intepretation of numeric strings beginning with 0 as octal representation has been done away with. These are now treated as decimal. All such uses within a script should instead use the explicit 0o prefix for octal notation. Particular care must be taken on input and output when interacting with programs or users based on the old interpretation.

See TIP 114.

Underscores in integers

Tcl 9 allows the use of the _ underscore character as a separator in numeric strings, e.g. 1_000_000, 1_0.0_1. It is important to note that this applies not just to literals in source code as in some other languages but run-time values as well. For example, if the string is integer command is used to validate user input as integers, it will allow underscores to be entered. If this is not desired, such validation needs to be changed to use scan or regexps.

See TIP 551.

Changes in integer classification and handling

The command string is integer now accepts integers of any size and is not limited to 32 bit values. Analogously, the int() function will no longer truncate integer values to a 32-bit range. The following illustrates the difference.

In Tcl 8,

% string is integer 0x100000001
0
% expr {int(0x100000001)}
1

In Tcl 9,

% string is integer 0x100000001
1
% expr {int(0x100000001)}
4294967297

Thus string is integer should not be used for range validation; use explicit range checks. Likewise, int() cannot be used for truncation; use explicit masking of high order bits.

Further, the commands string is entier and string is wide and functions entier() and wide() are deprecated. Again, use explicit range checks and masking instead.

See TIP 514.

Change in variable name resolution

In Tcl 8 variable names that are not absolute, are resolved by looking first in the current namespace, and then in the global namespace. In Tcl 9 such variables are always interpreted as relative to the current namespace.

This avoids the problem that setting a variable inside a namespace scope will overwrite a global variable, if that global variable exists and the variable does not exist relative to the namespace. But it also has some consequences that may not immediately be obvious:

Q - has the behavior of global changed within namespaces?

The behavior of the global command within namespaces has not changed. The command has no effect when used outside the context of a proc or apply body. Using global inside a namespace eval block does nothing. Variable names passed to the global command are still resolved relative to the global namespace, as they have always been. So, using global ns::var (inside a proc or apply body) will create a local variable var that is linked to the namespace variable ::ns::var.

See TIP 278.

Arguments to load are Unicode-aware and case sensitive

In Tcl 8, the second argument to the load command which specifies the initialization function was case-insensitive. This is inconsistent with the rules for package names which are case-sensitive. In Tcl 9, initialization function names are now case sensitive. For example, if the initialization function was called Sample_Init, either of the following commands would work in Tcl 8

load sample.dll sample
load sample.dll Sample

In Tcl 9, only the latter would be successful.

If compatibility with both Tcl 8 and 9 is desired, the following idiom may be used

load sample.dll [string totitle sample]

Note that instances of the load command that do not have the second argument specified need not be changed.

In addition, Tcl 9 does not restrict the names to ASCII. In practice this should not have any compatibility issues.

In addition, library names compiled with TCL9 headers get the prefix "tcl9" in its name. The aim is to have a version for TCL8.7 and TCL9 in the same folder.

The names are as follows:

A typical multi-version pkgIndex.tcl file may look like this:

if {![package vsatisfies [package provide Tcl] 8.6-]} {
    return
}
if {[package vsatisfies [package provide Tcl] 9.0-]} {
    package ifneeded tdbc::odbc 1.1.6 \
	    "[list load [file join $dir tcl9tdbcodbc116.dll] [string totitle tdbcodbc]]"
} else {
    package ifneeded tdbc::odbc 1.1.6 \
	    "[list load [file join $dir tdbcodbc116.dll] [string totitle tdbcodbc]]"
}

See TIP 595.

Writing version and build-independent pkgIndex.tcl files

When writing pkgIndex.tcl files that are compatible with both Tcl 8 and Tcl 9 as well as the autoconf and nmake build systems, the following differences must be accounted for:

The simplest way to achieve the above via the TEA build system is to write a pkgIndex.tcl.in template for autoconf similar to the following:

package ifneeded @PACKAGE_NAME@ @PACKAGE_VERSION@ \
    [list apply [list {dir} {
        set packageInitName [string totitle @PACKAGE_NAME@]
        set path [file join $dir "@PKG_LIB_FILE@"]
        uplevel #0 [list load $path $packageInitName]
    }] $dir]

The configure.ac in the autotools build should contain the line

AC_CONFIG_FILES([Makefile pkgIndex.tcl])

The makefile.vc should contain the line

pkgindex: default-pkgindex-tea

This will result in both build systems generating a pkgIndex.tcl that works for Tcl 8 as well as Tcl 9.

Default encoding for scripts is UTF-8

The default encoding used when reading scripts, either with the source command without an explicit -encoding option, or passed as a command line argument to tclsh is utf-8 in Tcl 9 instead of the system encoding as in Tcl 8.

No changes are required for scripts that were pure ASCII and used the \u or \U escapes for non-ASCII characters. Scripts that were saved using a (non-utf8) system encoding will have to be modified or sourced using an explicit -encoding option.

In TCL8.6, it is anyway good practice to include the -encoding parameter to the source command for portability. A pkgIndex.tcl file may look like this:

package ifneeded mypackage 1.0 [list source -encoding cp1252 [file join $dir mypackage.tcl]]
This will still work in 9.0 for any supported codepage.

See TIP 587.

Encoding profile strict for scripts

The strict encoding profile applies to the source command. The source command may throw encoding errors.

Scripts with non ASCII characters, even in comments, may now fail if the encoding is wrong. A typical example is a copyright glyph in a non UTF-8 codepage "©" script file.

In TCL8.6, those errors did not arise, as the active tcl8 profile silently changed those characters to inline iso-8859-1 characters. If it was real code (like text), the script changed in functionality.

System encoding on Windows may change to UTF-8

The Windows executable Manifest of tclsh.exe and wish.exe were changed to claim UTF-8 character set. On my Windows 10 system with German locale, the command encoding system returns utf-8 with TCL/Tk 9.0 and cp1252 with TCL/Tk 8.6. This is due to the manifest change, not any change within TCL/Tk.

Due to that, the change to UTF-8 encoding of the source command may not be reverted by:

source -encoding [encoding system] file.tcl

For me, this is according where Windows is moving. If I create nowdays a file using the Windows editor application, it is encoded in UTF-8. With older WIndows versions, this was cp1252.

Full Unicode support

In Tcl 8, characters outside the BMP were stored as a surrogate pair and effectively treated as a string of length 2. This is no longer the case in Tcl 9 which has full support for the entire Unicode range and will (correctly) treat these characters as strings of length 1.

Any workarounds to deal with these shortcomings in Tcl 8 need to be removed.

See TIP 497.

Traces on upvars linked to arrays

In Tcl 8, traces set on an array would not fire when an element of the array was modified through an upvar link. This documented but incongruent behavior is fixed in Tcl 9. Array traces will fire even when the element is accessed through an upvar link.

See TIP 634.

glob no longer raises an error if no files match

In Tcl 9, the glob command returns an empty list if no files match the specified pattern instead rasiing an error. This is true irrespective of the presence of the -nocomplain option which is still accepted but has no significance. Any scripts that expect an error to be raised in such cases will need to be modified to check for an empty list instead.

See TIP 637.

Removal of deprecated commands or arguments

These commands or arguments have been deprecated and have been removed from Tcl 9. Scripts making use of these should be modified as below:

See TIP 485

Deprecated trace subcommands variable, vdelete, vinfo have been removed

These commands have been deprecated since Tcl 8.4 and have been removed from Tcl 9. Scripts making use of these should be modified as below:

See TIP 673.

Removal of tcl_precision

The long deprecated variable tcl_precision that controlled conversion of floating point numbers to strings has been removed. Use the format command instead to control number of digits generated.

See TIP 488.

Zipfs file system

Tcl 9 introduces the ability to embed scripts as a zip archive bound to an executable or shared library. The default build is configured to make use of this capability and binds the Tcl initialization and support scripts into the shared library (for shared builds) or executable (for static builds). From a scripting perspective, this has several ramifications in terms of compatibility:

        % file volumes
        //zipfs:/ /

Channel option -eofchar change

Channel option -eofchar on write is not supported any more. The default on MS-Windows for channels is now the empty string (like on all other platforms).

% set t [open test.txt w+]
file46a7048
% fconfigure $t -eofchar
% fconfigure $t -eofchar {a b}
bad value for -eofchar: must be non-NUL ASCII character

See TIP 646.