Tcl Source Code

Artifact [84fde19d5e]
Login

Artifact 84fde19d5e65bdbd25b2fcabf7f1dc9d8b5f24a6025f1386a577be76a52bc7ce:

Wiki page [Unsupported icu command] by apnadkarni 2024-06-16 12:02:43.
D 2024-06-16T12:02:43.499
L Unsupported\sicu\scommand
N text/x-markdown
P aab79690b7fc22e84eae0439085877fc1962cdfd22fe496e4fd60ef83955a5e2
U apnadkarni
W 2771
This is a proposal to add a `tcl::unsupported::icu` command to Tcl 9.
The primary goal is to help with migration of Tcl 8 scripts to Tcl 9.
A (much) lesser benefit is to be able to experiment with ICU functions
for handling strings and encodings.

## Motivation

There are two changes in Tcl 9 that together can prevent even sourcing
Tcl 8 scripts.

- Change in the `source` encoding command to use UTF-8 by default.

- The introduction of the `strict` encoding profile for (most) I/O.

The two changes above mean that even a single copyright (U+00A9) character
in an otherwise ASCII file will prevent the script from being loaded if
the file is in ISO8859-1 encoding as is common.

In addition, text files containing non-ASCII characters that were written
in `[encoding system]` in Tcl 8 may not be readable in Tcl 9 without an
explicit `fconfigure -encoding` invocation. This is a particular problem
on Windows as `[encoding system]` in Tcl 9 returns `utf-8` irrespective
of the user's code page setting that was used in Tcl 8.

The [tcl9migrate](https://github.com/apnadkarni/tcl9-migrate) tool is
an on-going (as in do not try it yet!) effort to help users with the above
as well as other incompatibilities
like octal and tilde expansion. It includes both a static checker based
on Nagelfar as well as a runtime shim to help with the data file encoding
issues. However, it requires some functionality from ICU for this
purpose.

Hence this proposal.

## Specification

The `tcl::unsupported::icu` command ensemble includes subcommands

- `detect` to guess file encodings
- `icuToTcl` and `tclToIcu` to map ICU names to Tcl encoding names and vice versa

These wrap the lower level ICU API which is internal use only at the moment and not
documented here.

The command will be auto-loaded on first use.

The ICU libraries will not be shipped with Tcl and the functionality
will not be available on systems that do not have them installed.
This is acceptable because the intended use is on developer systems
used for porting scripts. On recent versions of Windows 10 and later,
ICU is already present as part of the system libraries.

## Implementation

Implementation follows the one already present in Tk except for
differences in libraries loaded stemming from different API's used.
Combining the two is for a later time.

Like Tk, the build system is unaffected as the ICU libraries are
loaded only at runtime. This is intentional though there is a performance
cost associated with searching for the library. This is acceptable
as it will only be incurred on use of the `icu` command.

Code is in branch [apn-experiment-chardet](https://core.tcl-lang.org/tcl/timeline?r=apn-experiment-chardet).




Z ceab097839dc0f8139c0cc90cb3222ab