Work in progress... Currently focused on incompatibilities. Possibly hints on improving scripts making use of new facilities could be added at some point.
The migration tool may help identify some of these issues. Download
Strict handling of ill-formed encoded data
Conceptually, Tcl strings are a sequence of Unicode code points. These strings
are converted to and from external encodings via channel I/O or explicit use of
the encoding
command. The behavior of Tcl has changed in Tcl 9 in the presence
of ill-formed encoded data in input streams or if output streams encodings do
not support the Unicode code point being written.
In the case of input channels, Tcl 8 would silently map invalid bytes in the stream to the numerically equivalent Unicode code points, in essence presuming iso8859-1 encoding. On output streams, characters not supported by the output encoding were substituted with an encoding-specific replacement character.
In contrast, channels in Tcl 9 are by default configured with the strict
encoding profile and will raise an error exception for both the above cases. If
Tcl 8 behavior is desired, applications can use the -profile
option to
fconfigure
to apply the tcl8
encoding profile to the channel. This is not
recommended as it is not standards-compliant and can be seen as corrupting data.
Instead, a standards-compliant alternative is to use the replace
profile which
like the tcl8
profile will also not raise errors in the presence of invalidly
encoded data.
The same behavioural changes also apply to explicit transforms with the
encoding
command. Ill formed data or unsupported characters will result in an
error exception. The -profile
option can be used in this case as well to
change this behavior.
For more on encoding profiles, refer to the encoding
manpage.
See TIP 657.
Tilde substitution
File path arguments in Tcl 8 would be subject to tilde substitution
where ~
and ~USER
prefixes in the path would be replaced with the
home directory of the current or named user respectively. Although
convenient for interactive use, this behavior lead to both security and
robustness issues as documented in TIP 602.
Tcl 9 therefore no longer does this substitution and scripts must
explicitly do the same if desired with the new commands file home
and
file tildeexpand
. Conversely, commands that had to protect against
this treatment of ~
by prefixing with ./
(glob
, file split
etc.)
will no longer to do so. Scripts that had their own workarounds to
protect against tilde substitution no longer need to do so.
As an exception for convenience for end users, tilde substitution is
still done at start up time on environment values containing library
paths (TCLLIBPATH
and analogous tm path variants).
See TIP 602.
Octals
The Tcl 8 intepretation of numeric strings beginning with 0
as octal
representation has been done away with. These are now treated as decimal.
All such uses within a script should instead use the explicit 0o
prefix
for octal notation. Particular care must be taken on input and output
when interacting with programs or users based on the old interpretation.
See TIP 114.
Underscores in integers
Tcl 9 allows the use of the _
underscore character as a separator in
numeric strings, e.g. 1_000_000
, 1_0.0_1
. It is important to note
that this applies not just to literals in source code as in some other
languages but run-time values as well. For example, if the string is
integer
command is used to validate user input as integers, it will
allow underscores to be entered. If this is not desired, such validation
needs to be changed to use scan
or regexps.
See TIP 551.
Changes in integer classification and handling
The command string is integer
now accepts integers of any size and is not limited to 32 bit values. Analogously, the int()
function will no longer truncate integer values to a 32-bit range. The following illustrates the difference.
In Tcl 8,
% string is integer 0x100000001
0
% expr {int(0x100000001)}
1
In Tcl 9,
% string is integer 0x100000001
1
% expr {int(0x100000001)}
4294967297
Thus string is integer
should not be used for range validation; use explicit range checks. Likewise, int()
cannot be used for truncation; use explicit masking of high order bits.
Further, the commands string is entier
and string is wide
and functions entier()
and wide()
are deprecated. Again, use explicit range checks and masking instead.
See TIP 514.
Change in variable name resolution
In Tcl 8 variable names that are not absolute, are resolved by looking first in the current namespace, and then in the global namespace. In Tcl 9 such variables are always interpreted as relative to the current namespace.
This avoids the problem that setting a variable inside a namespace scope will overwrite a global variable, if that global variable exists and the variable does not exist relative to the namespace. But it also has some consequences that may not immediately be obvious:
- Access to well-known variables such as env and tcl_platform inside a namespace eval block will need to either fully qualified or the variable must be brought into scope using
namespace upvar
command (namespace upvar :: env env
). - Inside procs in a namespace ns, it used to be possible to access namespace variables within a global namespace
ns
by referring to them as ns::var. That no longer works. Again, the solution is to either fully qualify the variable, or bring it into scope. In addition to the namespace upvar command, in this case that can also be done using the global command (global ns::var
), as well as the variable command (variable var
, orvariable ::ns::var
). The following egrep may be useful to detect such cases:
Not a complete regexp (oo, expr, variable and may be other commands)! Note each occurrence has to be checked since it may be legitimate reference relative to the current namespace, i.e. a child namespace, (to be left alone) or a no-longer-valid reference relative to global namespace (which must be fully qualified). Strongly recommended this check be done. tclprodebug had over 500 such references (perfectly legitimate in Tcl8, not bugs) leading to error exceptions if lucky and to rather mysterious phase of the moon misbehavior otherwise.egrep '(\S+\s+\$|(set|unset|append|lset|lappend|dict\s+(set|incr|append|lappend|with|update|unset)|upvar|namespace var|parray|info exists|gets\s+\S+|array\s+\S+|vwait)\s+)[a-zA-Z]+::' *.tcl
Q - has the behavior of global
changed within namespaces?
The behavior of the global
command within namespaces has not changed. The command has no effect when used outside the context of a proc or apply body. Using global
inside a namespace eval block does nothing. Variable names passed to the global command are still resolved relative to the global namespace, as they have always been. So, using global ns::var
(inside a proc or apply body) will create a local variable var that is linked to the namespace variable ::ns::var.
See TIP 278.
Changes in parsing of variable names
When parsing braced variable references of the form ${..}
, in 8.6 the first }
always terminated the variable name. In 9.0, any nested {
are counted and the variable name is terminated by the }
matching the opening {
.
A similar change also applies to parsing of array element names enclosed in ()
.
See TIP 465.
Arguments to load
are Unicode-aware and case sensitive
In Tcl 8, the second argument to the load
command which specifies
the initialization function was case-insensitive. This is inconsistent
with the rules for package names which are case-sensitive. In Tcl 9,
initialization function names are now case sensitive. For example,
if the initialization function was called Sample_Init
, either of the
following commands would work in Tcl 8
load sample.dll sample
load sample.dll Sample
In Tcl 9, only the latter would be successful.
If compatibility with both Tcl 8 and 9 is desired, the following idiom may be used
load sample.dll [string totitle sample]
Note that instances of the load
command that do not have the second
argument specified need not be changed.
In addition, Tcl 9 does not restrict the names to ASCII. In practice this should not have any compatibility issues.
In addition, library names compiled with TCL9 headers get the prefix "tcl9" in its name. The aim is to have a version for TCL8.7 and TCL9 in the same folder.
The names are as follows:
- tk87.dll -> tcl9tk87.dll (on Windows)
- libtk8.7.so -> libtcl9tk8.7.so (on UNIX)
A typical multi-version pkgIndex.tcl file may look like this:
if {![package vsatisfies [package provide Tcl] 8.6-]} {
return
}
if {[package vsatisfies [package provide Tcl] 9.0-]} {
package ifneeded tdbc::odbc 1.1.6 \
"[list load [file join $dir tcl9tdbcodbc116.dll] [string totitle tdbcodbc]]"
} else {
package ifneeded tdbc::odbc 1.1.6 \
"[list load [file join $dir tdbcodbc116.dll] [string totitle tdbcodbc]]"
}
See TIP 595.
Writing version and build-independent pkgIndex.tcl files
When writing pkgIndex.tcl
files that are compatible with both
Tcl 8 and Tcl 9 as well as the autoconf and nmake build systems, the
following differences must be accounted for:
Irrespective of the build system, Tcl 9 extension binaries are named differently (prefixed with
tcl9
) from Tcl 8The initialization function name in Tcl 9 is case sensitive as noted previously.
In Tcl 8, extension binaries built with the nmake system have a suffix attached. For example, threaded builds have a
t
suffix attached while non-threaded builds do not. The autotools based build system does not distinguish in this manner. This inconsistency is not present in Tcl 9 but needs to be accounted for if the package supports both Tcl versions and build systems.
The simplest way to achieve the above via the TEA build system is to write a
pkgIndex.tcl.in
template for autoconf similar to the following:
package ifneeded @PACKAGE_NAME@ @PACKAGE_VERSION@ \
[list apply [list {dir} {
set packageInitName [string totitle @PACKAGE_NAME@]
set path [file join $dir "@PKG_LIB_FILE@"]
uplevel #0 [list load $path $packageInitName]
}] $dir]
The configure.ac
in the autotools build should contain the line
AC_CONFIG_FILES([Makefile pkgIndex.tcl])
The makefile.vc
should contain the line
pkgindex: default-pkgindex-tea
This will result in both build systems generating a pkgIndex.tcl
that works for Tcl 8 as well as Tcl 9.
Default encoding for scripts is UTF-8
The default encoding used when reading scripts, either with the source
command without an explicit -encoding
option, or passed as a command
line argument to tclsh
is utf-8 in Tcl 9 instead of the system
encoding as in Tcl 8.
No changes are required for scripts that were pure ASCII and used
the \u
or \U
escapes for non-ASCII characters. Scripts that were
saved using a (non-utf8) system encoding will have to be modified
or sourced using an explicit -encoding
option.
In TCL8.6, it is anyway good practice to include the -encoding
parameter to the source command for portability.
A pkgIndex.tcl file may look like this:
package ifneeded mypackage 1.0 [list source -encoding cp1252 [file join $dir mypackage.tcl]]
This will still work in 9.0 for any supported codepage.
See TIP 587.
Encoding profile strict for scripts
The strict encoding profile applies to the source command. The source command may throw encoding errors.
Scripts with non ASCII characters, even in comments, may now fail if the encoding is wrong. A typical example is a copyright glyph in a non UTF-8 codepage "©" script file.
In TCL8.6, those errors did not arise, as the active tcl8 profile silently changed those characters to inline iso-8859-1 characters. If it was real code (like text), the script changed in functionality.
System encoding on Windows may change to UTF-8
The Windows executable Manifest of tclsh.exe and wish.exe were changed to claim UTF-8 character set. On my Windows 10 system with German locale, the command encoding system returns utf-8 with TCL/Tk 9.0 and cp1252 with TCL/Tk 8.6. This is due to the manifest change, not any change within TCL/Tk.
Due to that, the change to UTF-8 encoding of the source command may not be reverted by:
source -encoding [encoding system] file.tcl
For me, this is according where Windows is moving. If I create nowdays a file using the Windows editor application, it is encoded in UTF-8. With older Windows versions, this was cp1252.
One effect of this is that a file written by a 8.x application using the default encoding will not be readable by a 9.x application unless the channel is explicitly configured to use the original encoding. Unfortunately, there is no way to know the original default encoding. A guess may be made by checking the Windows registry; something along the lines of (error checking ignored)
package require registry
set winCodePage [registry get {HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage} ACP]
set tclName cp[format %u $winCodePage]
if {$tclName in [encoding names]} {
fconfigure $chan -encoding $tclName
} else {
...try some other method...
}
Removal of binary
and the empty string as valid values for the -encoding
setting for channels
In Tcl 8.6, binary
or the empty string could be passed as a valid option to the
-encoding
option of the chan configure
and fconfigure
commands.
These are no longer valid in Tcl 9.
Pass iso8859-1
as the encoding value for the same effect. See the TIP for the rationale.
To configure a channel for binary data, either open
must be passed the b
modifier
(e.g. open foo.txt rb
) or the channel must be configured with -translation binary
.
This was the case in Tcl 8.6 as well and most uses of -encoding binary
for the
purpose were either redundant or incorrect.
See TIP 699.
Full Unicode support
In Tcl 8, characters outside the BMP were stored as a surrogate pair and effectively treated as a string of length 2. This is no longer the case in Tcl 9 which has full support for the entire Unicode range and will (correctly) treat these characters as strings of length 1.
Any workarounds to deal with these shortcomings in Tcl 8 need to be removed.
See TIP 497.
Traces on upvars linked to arrays
In Tcl 8, traces set on an array would not fire when an element of the array
was modified through an upvar
link. This documented but incongruent
behavior is fixed in Tcl 9. Array traces will fire even when the element
is accessed through an upvar
link.
See TIP 634.
glob
no longer raises an error if no files match
In Tcl 9, the glob
command returns an empty list if no files match the
specified pattern instead rasiing an error. This is true irrespective of
the presence of the -nocomplain
option which is still accepted but has
no significance. Any scripts that expect an error to be raised in such
cases will need to be modified to check for an empty list instead.
See TIP 637.
Removal of deprecated commands or arguments
These commands or arguments have been deprecated and have been removed from Tcl 9. Scripts making use of these should be modified as below:
- Replace
case
withswitch
- Replace
read <chan> nonewline
withread -nonewline <chan>
- Replace
puts <chan> <line> nonewline
withputs -nonewline <chan> <line>
See TIP 485
Deprecated trace
subcommands variable
, vdelete
, vinfo
have been removed
These commands have been deprecated since Tcl 8.4 and have been removed from Tcl 9. Scripts making use of these should be modified as below:
Replace
trace variable
withtrace add variable
Replace
trace vdelete
withtrace remove variable
Replace
trace vinfo
withtrace info variable
See TIP 673.
Removal of tcl_precision
The long deprecated variable tcl_precision
that controlled conversion
of floating point numbers to strings has been removed. Use the format
command instead to control number of digits generated.
See TIP 488.
Zipfs file system
Tcl 9 introduces the ability to embed scripts as a zip archive bound to an executable or shared library. The default build is configured to make use of this capability and binds the Tcl initialization and support scripts into the shared library (for shared builds) or executable (for static builds). From a scripting perspective, this has several ramifications in terms of compatibility:
There is new file system type
zipfs
with its own rules for file names and directories. In particular, names are case sensitive even on Windows. Other differences include disallowing of creation of new files and directories. Existing files may be modified but the changes are not persisted to disk and are lost after dismounting the archive. See thezipfs
manpage for details.The feature has the side effect of introducing the concept of volumes on Unix (Windows programmers already had to deal with drives as volumes).
% file volumes
//zipfs:/ /
One implication of the above is that searches for files can no longer simply begin at
/
but need to iterate across all volumes.The paths within
auto_path
may point to locations within thezipfs
volume. These are not writable so any installers that copy files into that location will fail. Installers should check for writable locations in any case but this change will result in errors in installers that write to directories inauto_path
without these checks.
Channel option -eofchar change
Channel option -eofchar on write is not supported any more. The default on MS-Windows for channels is now the empty string (like on all other platforms).
% set t [open test.txt w+]
file46a7048
% fconfigure $t -eofchar
% fconfigure $t -eofchar {a b}
bad value for -eofchar: must be non-NUL ASCII character
See TIP 646.
Threaded builds
Tcl builds are now threaded by default. The tcl_platform(threaded)
variable is no longer defined.
To check for threaded builds in a 8.6/9 compatible way, use
tcl::pkgconfig get threaded
See TIP 491.
Removal of unsupported commands.
The ::tcl::unsupported::inject
command has been removed. The documented
coroinject
and coroprobe
commands may be used in its place.