Tcl Source Code

Update of "Unsupported icu command"
Login
EuroTcl/OpenACS 11 - 12 JULY 2024, VIENNA

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 84fde19d5e65bdbd25b2fcabf7f1dc9d8b5f24a6025f1386a577be76a52bc7ce
Page Name:Unsupported icu command
Date: 2024-06-16 12:02:43
Original User: apnadkarni
Mimetype:text/x-markdown
Parent: aab79690b7fc22e84eae0439085877fc1962cdfd22fe496e4fd60ef83955a5e2 (diff)
Next b945cc6179cfd88de515215aca5b01c0dae98181194139cf050e0aa4a772b6c1
Content

This is a proposal to add a tcl::unsupported::icu command to Tcl 9. The primary goal is to help with migration of Tcl 8 scripts to Tcl 9. A (much) lesser benefit is to be able to experiment with ICU functions for handling strings and encodings.

Motivation

There are two changes in Tcl 9 that together can prevent even sourcing Tcl 8 scripts.

The two changes above mean that even a single copyright (U+00A9) character in an otherwise ASCII file will prevent the script from being loaded if the file is in ISO8859-1 encoding as is common.

In addition, text files containing non-ASCII characters that were written in [encoding system] in Tcl 8 may not be readable in Tcl 9 without an explicit fconfigure -encoding invocation. This is a particular problem on Windows as [encoding system] in Tcl 9 returns utf-8 irrespective of the user's code page setting that was used in Tcl 8.

The tcl9migrate tool is an on-going (as in do not try it yet!) effort to help users with the above as well as other incompatibilities like octal and tilde expansion. It includes both a static checker based on Nagelfar as well as a runtime shim to help with the data file encoding issues. However, it requires some functionality from ICU for this purpose.

Hence this proposal.

Specification

The tcl::unsupported::icu command ensemble includes subcommands

These wrap the lower level ICU API which is internal use only at the moment and not documented here.

The command will be auto-loaded on first use.

The ICU libraries will not be shipped with Tcl and the functionality will not be available on systems that do not have them installed. This is acceptable because the intended use is on developer systems used for porting scripts. On recent versions of Windows 10 and later, ICU is already present as part of the system libraries.

Implementation

Implementation follows the one already present in Tk except for differences in libraries loaded stemming from different API's used. Combining the two is for a later time.

Like Tk, the build system is unaffected as the ICU libraries are loaded only at runtime. This is intentional though there is a performance cost associated with searching for the library. This is acceptable as it will only be incurred on use of the icu command.

Code is in branch apn-experiment-chardet.