Tcl Source Code

Artifact [e2f87f85f6]
Login
EuroTcl/OpenACS 11 - 12 JULY 2024, VIENNA

Artifact e2f87f85f66546feb4e775b4b161c3fa22e216da104a2a8a5293c830139feab1:

Wiki page [Migrating scripts to Tcl 9] by apnadkarni 2024-06-19 09:38:55.
D 2024-06-19T09:38:55.385
L Migrating\sscripts\sto\sTcl\s9
N text/x-markdown
P 62115ff7738931344350e1b2b5fadb50638112d965e143a23bf4f0e72c3a0648
U apnadkarni
W 19096
Work in progress...
Currently focused on incompatibilities. Possibly hints on improving
scripts making use of new facilities could be added at some point

## Strict handling of ill-formed encoded data

Conceptually, Tcl strings are a sequence of Unicode code points. These strings
are converted to and from external encodings via channel I/O or explicit use of
the `encoding` command. The behavior of Tcl has changed in Tcl 9 in the presence
of ill-formed encoded data in input streams or if output streams encodings do
not support the Unicode code point being written.

In the case of input channels, Tcl 8 would silently map invalid bytes in the
stream to the numerically equivalent Unicode code points, in essence presuming
iso8859-1 encoding. On output streams, characters not supported by the output
encoding were substituted with an encoding-specific replacement character.

In contrast, channels in Tcl 9 are by default configured with the `strict`
encoding profile and will raise an error exception for both the above cases. If
Tcl 8 behavior is desired, applications can use the `-profile` option to
`fconfigure` to apply the `tcl8` encoding profile to the channel. This is not
recommended as it is not standards-compliant and can be seen as corrupting data.
Instead, a standards-compliant alternative is to use the `replace` profile which
like the `tcl8` profile will also not raise errors in the presence of invalidly
encoded data.

The same behavioural changes also apply to explicit transforms with the
`encoding` command. Ill formed data or unsupported characters will result in an
error exception. The `-profile` option can be used in this case as well to
change this behavior.

For more on encoding profiles, refer to the `encoding` manpage.

See [TIP 657](https://core.tcl-lang.org/tips/doc/trunk/tip/657.md).



## Tilde substitution

File path arguments in Tcl 8 would be subject to tilde substitution
where `~` and `~USER` prefixes in the path would be replaced with the
home directory of the current or named user respectively. Although
convenient for interactive use, this behavior lead to both security and
robustness issues as documented in TIP 602.

Tcl 9 therefore no longer does this substitution and scripts must
explicitly do the same if desired with the new commands `file home` and
`file tildeexpand`. Conversely, commands that had to protect against
this treatment of `~` by prefixing with `./` (`glob`, `file split` etc.)
will no longer to do so. Scripts that had their own workarounds to
protect against tilde substitution no longer need to do so.

As an exception for convenience for end users, tilde substitution is
still done at start up time on environment values containing library
paths (`TCLLIBPATH` and analogous tm path variants).

See [TIP 602](https://core.tcl-lang.org/tips/doc/trunk/tip/602.md).

## Octals

The Tcl 8 intepretation of numeric strings beginning with `0` as octal
representation has been done away with. These are now treated as decimal.
All such uses within a script should instead use the explicit `0o` prefix
for octal notation. Particular care must be taken on input and output
when interacting with programs or users based on the old interpretation.

See [TIP 114](https://core.tcl-lang.org/tips/doc/trunk/tip/114.md).

## Underscores in integers

Tcl 9 allows the use of the `_` underscore character as a separator in
numeric strings, e.g. `1_000_000`, `1_0.0_1`. It is important to note
that this applies not just to literals in source code as in some other
languages but run-time values as well. For example, if the `string is
integer` command is used to validate user input as integers, it will
allow underscores to be entered. If this is not desired, such validation
needs to be changed to use `scan` or regexps.

See [TIP 551](https://core.tcl-lang.org/tips/doc/trunk/tip/551.md).

## Changes in integer classification and handling

The command `string is integer` now accepts integers of any size and is not limited to 32 bit values. Analogously, the `int()` function will no longer truncate integer values to a 32-bit range. The following illustrates the difference.

In Tcl 8,

```
% string is integer 0x100000001
0
% expr {int(0x100000001)}
1
```

In Tcl 9,

```
% string is integer 0x100000001
1
% expr {int(0x100000001)}
4294967297
```

Thus `string is integer` should not be used for range validation; use explicit range checks. Likewise, `int()` cannot be used for truncation; use explicit masking of high order bits.

Further, the commands `string is entier` and `string is wide` and functions `entier()` and `wide()` are deprecated. Again, use explicit range checks and masking instead.

See [TIP 514](https://core.tcl-lang.org/tips/doc/trunk/tip/514.md).

## Change in variable name resolution

In Tcl 8 variable names that are not absolute, are resolved by looking first in the current namespace, and then in the global namespace. In Tcl 9 such variables are always interpreted as relative to the current namespace.

This avoids the problem that setting a variable inside a namespace scope will overwrite a global variable, if that global variable exists and the variable does not exist relative to the namespace. But it also has some consequences that may not immediately be obvious:

* Access to well-known variables such as env and tcl_platform inside a namespace eval block will need to either fully qualified or the variable must be brought into scope using `namespace upvar` command (`namespace upvar :: env env`).
* Inside procs in a namespace ns, it used to be possible to access namespace variables within a global namespace `ns` by referring to them as ns::var. That no longer works. Again, the solution is to either fully qualify the variable, or bring it into scope. In addition to the namespace upvar command, in this case that can also be done using the global command (`global ns::var`), as well as the variable command (`variable var`, or `variable ::ns::var`). The following egrep may be useful to detect such cases:
```
egrep '(\S+\s+\$|(set|unset|append|lset|lappend|dict\s+(set|incr|append|lappend|with|update|unset)|upvar|namespace var|parray|info exists|gets\s+\S+|array\s+\S+|vwait)\s+)[a-zA-Z]+::' *.tcl
```
Not a complete regexp (oo, expr, variable and may be other commands)! Note each occurrence has to be checked since it may be legitimate reference relative to the current namespace, i.e. a child namespace, (to be left alone) or a no-longer-valid reference relative to global namespace (which must be fully qualified). Strongly recommended this check be done. tclprodebug had over 500 such references (perfectly legitimate in Tcl8, not bugs) leading to error exceptions if lucky and to rather mysterious phase of the moon misbehavior otherwise.

Q - has the behavior of `global` changed within namespaces?

The behavior of the `global` command within namespaces has not changed. The command has no effect when used outside the context of a proc or apply body. Using `global` inside a namespace eval block does nothing. Variable names passed to the global command are still resolved relative to the global namespace, as they have always been. So, using `global ns::var` (inside a proc or apply body) will create a local variable var that is linked to the namespace variable ::ns::var.

See [TIP 278](https://core.tcl-lang.org/tips/doc/trunk/tip/278.md).

## Changes in parsing of variable names

When parsing braced variable references of the form `${..}`, in 8.6 the first `}` always terminated the variable name. In 9.0, any nested `{` are counted and the variable name is terminated by the `}` matching the opening `{`.
A similar change also applies to parsing of array element names enclosed in `()`.

See [TIP 465](https://core.tcl-lang.org/tips/doc/trunk/tip/465.md).

## Arguments to `load` are Unicode-aware and case sensitive

In Tcl 8, the second argument to the `load` command which specifies
the initialization function was case-insensitive. This is inconsistent
with the rules for package names which are case-sensitive. In Tcl 9,
initialization function names are now case sensitive. For example,
if the initialization function was called `Sample_Init`, either of the
following commands would work in Tcl 8

```
load sample.dll sample
load sample.dll Sample
```

In Tcl 9, only the latter would be successful.

If compatibility with both Tcl 8 and 9 is desired, the following
idiom may be used

```
load sample.dll [string totitle sample]
```

Note that instances of the `load` command that do not have the second
argument specified need not be changed.

In addition, Tcl 9 does not restrict the names to ASCII. In practice
this should not have any compatibility issues.

In addition, library names compiled with TCL9 headers get the prefix "tcl9" in its name.
The aim is to have a version for TCL8.7 and TCL9 in the same folder.

The names are as follows:

   *   tk87.dll -> tcl9tk87.dll (on Windows)
   *   libtk8.7.so -> libtcl9tk8.7.so (on UNIX)

A typical multi-version pkgIndex.tcl file may look like this:

```
if {![package vsatisfies [package provide Tcl] 8.6-]} {
    return
}
if {[package vsatisfies [package provide Tcl] 9.0-]} {
    package ifneeded tdbc::odbc 1.1.6 \
	    "[list load [file join $dir tcl9tdbcodbc116.dll] [string totitle tdbcodbc]]"
} else {
    package ifneeded tdbc::odbc 1.1.6 \
	    "[list load [file join $dir tdbcodbc116.dll] [string totitle tdbcodbc]]"
}
```

See [TIP 595](https://core.tcl-lang.org/tips/doc/trunk/tip/595.md).

## Writing version and build-independent pkgIndex.tcl files

When writing `pkgIndex.tcl` files that are compatible with both
Tcl 8 and Tcl 9 as well as the autoconf and nmake build systems, the
following differences must be accounted for:

* Irrespective of the build system, Tcl 9 extension binaries are named differently
(prefixed with `tcl9`) from Tcl 8

* The initialization function name in Tcl 9 is case sensitive as noted previously.

* In Tcl 8, extension binaries built with the nmake system have a suffix attached.
For example, threaded builds have a `t` suffix attached while non-threaded builds
do not. The autotools based build system does not distinguish in this manner. This
inconsistency is not present in Tcl 9 but needs to be accounted for if the package
supports both Tcl versions and build systems.

The simplest way to achieve the above via the TEA build system is to write a
`pkgIndex.tcl.in` template for autoconf similar to the following:

```
package ifneeded @PACKAGE_NAME@ @PACKAGE_VERSION@ \
    [list apply [list {dir} {
        set packageInitName [string totitle @PACKAGE_NAME@]
        set path [file join $dir "@PKG_LIB_FILE@"]
        uplevel #0 [list load $path $packageInitName]
    }] $dir]
```

The `configure.ac` in the autotools build should contain the line

```
AC_CONFIG_FILES([Makefile pkgIndex.tcl])
```

The `makefile.vc` should contain the line

```
pkgindex: default-pkgindex-tea
```

This will result in both build systems generating a `pkgIndex.tcl`
that works for Tcl 8 as well as Tcl 9.


## Default encoding for scripts is UTF-8

The default encoding used when reading scripts, either with the `source`
command without an explicit `-encoding` option, or passed as a command
line argument to `tclsh` is utf-8 in Tcl 9 instead of the system
encoding as in Tcl 8.

No changes are required for scripts that were pure ASCII and used
the `\u` or `\U` escapes for non-ASCII characters. Scripts that were
saved using a (non-utf8) system encoding will have to be modified
or sourced using an explicit `-encoding` option.

In TCL8.6, it is anyway good practice to include the `-encoding` parameter to the source command for portability.
A pkgIndex.tcl file may look like this:
```
package ifneeded mypackage 1.0 [list source -encoding cp1252 [file join $dir mypackage.tcl]]
```
This will still work in 9.0 for any supported codepage.

See [TIP 587](https://core.tcl-lang.org/tips/doc/trunk/tip/587.md).

## Encoding profile *strict* for scripts

The strict encoding profile applies to the source command.
The source command may throw encoding errors.

Scripts with non ASCII characters, even in comments, may now fail if the encoding is wrong.
A typical example is a copyright glyph in a non UTF-8 codepage "©" script file.

In TCL8.6, those errors did not arise, as the active tcl8 profile silently changed those characters to inline iso-8859-1 characters.
If it was real code (like text), the script changed in functionality.

## System encoding on Windows may change to UTF-8

The Windows executable Manifest of tclsh.exe and wish.exe were changed to claim UTF-8 character set.
On my Windows 10 system with German locale, the command *encoding system* returns *utf-8* with TCL/Tk 9.0 and *cp1252* with TCL/Tk 8.6.
This is due to the manifest change, not any change within TCL/Tk.

Due to that, the change to UTF-8 encoding of the source command may not be *reverted* by:

```
source -encoding [encoding system] file.tcl
```

For me, this is according where Windows is moving.
If I create nowdays a file using the Windows editor application, it is encoded in UTF-8.
With older Windows versions, this was cp1252.

One effect of this is that a file written by a 8.x application using
the default encoding will not be readable by a 9.x application unless
the channel is explicitly configured to use the original encoding.
Unfortunately, there is no way to know the original default encoding.
A **guess** may be made by checking the Windows registry; something
along the lines of (error checking ignored)

```
package require registry
set winCodePage [registry get {HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage} ACP]
set tclName cp[format %u $winCodePage]
if {$tclName in [encoding names]} {
    fconfigure $chan -encoding $tclName
} else {
    ...try some other method...
}
```

## Full Unicode support

In Tcl 8, characters outside the BMP were stored as a surrogate pair
and effectively treated as a string of length 2. This is no longer
the case in Tcl 9 which has full support for the entire Unicode range
and will (correctly) treat these characters as strings of length 1.

Any workarounds to deal with these shortcomings in Tcl 8 need to be
removed.

See [TIP 497](https://core.tcl-lang.org/tips/doc/trunk/tip/497.md).

## Traces on upvars linked to arrays

In Tcl 8, traces set on an array would not fire when an element of the array
was modified through an `upvar` link. This documented but incongruent
behavior is fixed in Tcl 9. Array traces will fire even when the element
is accessed through an `upvar` link.

See [TIP 634](https://core.tcl-lang.org/tips/doc/trunk/tip/634.md).

## `glob` no longer raises an error if no files match

In Tcl 9, the `glob` command returns an empty list if no files match the
specified pattern instead rasiing an error. This is true irrespective of
the presence of the `-nocomplain` option which is still accepted but has
no significance. Any scripts that expect an error to be raised in such
cases will need to be modified to check for an empty list instead.

See [TIP 637](https://core.tcl-lang.org/tips/doc/trunk/tip/637.md).

## Removal of deprecated commands or arguments

These commands or arguments have been deprecated and have been removed from Tcl 9.
Scripts making use of these should be modified as below:

* Replace `case` with `switch`
* Replace `read <chan> nonewline` with `read -nonewline <chan>`
* Replace `puts <chan> <line> nonewline` with `puts -nonewline <chan> <line>`

See [TIP 485](https://core.tcl-lang.org/tips/doc/trunk/tip/485.md)

## Deprecated `trace` subcommands `variable`, `vdelete`, `vinfo` have been removed

These commands have been deprecated since Tcl 8.4 and have been removed from Tcl 9.
Scripts making use of these should be modified as below:

* Replace `trace variable` with `trace add variable`

* Replace `trace vdelete` with `trace remove variable`

* Replace `trace vinfo` with `trace info variable`

See [TIP 673](https://core.tcl-lang.org/tips/doc/trunk/tip/673.md).

## Removal of `tcl_precision`

The long deprecated variable `tcl_precision` that controlled conversion
of floating point numbers to strings has been removed. Use the `format`
command instead to control number of digits generated.

See [TIP 488](https://core.tcl-lang.org/tips/doc/trunk/tip/488.md).

## Zipfs file system

Tcl 9 introduces the ability to embed scripts as a zip archive
bound to an executable or shared library. The default build is configured to make
use of this capability and binds the Tcl initialization and support scripts
into the shared library (for shared builds) or executable (for static builds).
From a scripting perspective, this has several ramifications in terms of
compatibility:

* There is new file system type `zipfs` with its own rules for file names
and directories. In particular, names are case sensitive even on Windows. Other
differences include disallowing of creation of new files and directories. Existing
files may be modified but the changes are not persisted to disk and are lost
after dismounting the archive. See the `zipfs` manpage for details.

* The feature has the side effect of introducing the concept of volumes
on Unix (Windows programmers already had to deal with drives as volumes).

```
        % file volumes
        //zipfs:/ /
```

* One implication of the above is that searches for files can no longer simply begin
at `/` but need to iterate across all volumes.

* The paths within `auto_path` *may* point to locations within the `zipfs` volume.
These are not writable so any installers that copy files into that location will fail.
Installers should check for writable locations in any case but this change will
result in errors in installers that write to directories in `auto_path` without
these checks.

## Channel option -eofchar change

Channel option *-eofchar* on write is not supported any more.
The default on MS-Windows for channels is now the empty string (like on all other platforms).

```
% set t [open test.txt w+]
file46a7048
% fconfigure $t -eofchar
% fconfigure $t -eofchar {a b}
bad value for -eofchar: must be non-NUL ASCII character
```

See [TIP 646](https://core.tcl-lang.org/tips/doc/trunk/tip/646.md).

## Threaded builds

Tcl builds are now threaded by default. The `tcl_platform(threaded)` variable is no longer defined.
To check for threaded builds in a 8.6/9 compatible way, use

```
tcl::pkgconfig get threaded
```

See [TIP 491](https://core.tcl-lang.org/tips/doc/trunk/tip/491.md).

## Removal of unsupported commands.

The `::tcl::unsupported::inject` command has been removed. The documented
`coroinject` and `coroprobe` commands may be used in its place.
Z da42a4db825d08b04d5f40f49bb1e4ca