Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Artifact ID: | 84fde19d5e65bdbd25b2fcabf7f1dc9d8b5f24a6025f1386a577be76a52bc7ce |
---|---|
Page Name: | Unsupported icu command |
Date: | 2024-06-16 12:02:43 |
Original User: | apnadkarni |
Mimetype: | text/x-markdown |
Parent: | aab79690b7fc22e84eae0439085877fc1962cdfd22fe496e4fd60ef83955a5e2 (diff) |
Next | b945cc6179cfd88de515215aca5b01c0dae98181194139cf050e0aa4a772b6c1 |
This is a proposal to add a tcl::unsupported::icu
command to Tcl 9.
The primary goal is to help with migration of Tcl 8 scripts to Tcl 9.
A (much) lesser benefit is to be able to experiment with ICU functions
for handling strings and encodings.
Motivation
There are two changes in Tcl 9 that together can prevent even sourcing Tcl 8 scripts.
Change in the
source
encoding command to use UTF-8 by default.The introduction of the
strict
encoding profile for (most) I/O.
The two changes above mean that even a single copyright (U+00A9) character in an otherwise ASCII file will prevent the script from being loaded if the file is in ISO8859-1 encoding as is common.
In addition, text files containing non-ASCII characters that were written
in [encoding system]
in Tcl 8 may not be readable in Tcl 9 without an
explicit fconfigure -encoding
invocation. This is a particular problem
on Windows as [encoding system]
in Tcl 9 returns utf-8
irrespective
of the user's code page setting that was used in Tcl 8.
The tcl9migrate tool is an on-going (as in do not try it yet!) effort to help users with the above as well as other incompatibilities like octal and tilde expansion. It includes both a static checker based on Nagelfar as well as a runtime shim to help with the data file encoding issues. However, it requires some functionality from ICU for this purpose.
Hence this proposal.
Specification
The tcl::unsupported::icu
command ensemble includes subcommands
detect
to guess file encodingsicuToTcl
andtclToIcu
to map ICU names to Tcl encoding names and vice versa
These wrap the lower level ICU API which is internal use only at the moment and not documented here.
The command will be auto-loaded on first use.
The ICU libraries will not be shipped with Tcl and the functionality will not be available on systems that do not have them installed. This is acceptable because the intended use is on developer systems used for porting scripts. On recent versions of Windows 10 and later, ICU is already present as part of the system libraries.
Implementation
Implementation follows the one already present in Tk except for differences in libraries loaded stemming from different API's used. Combining the two is for a later time.
Like Tk, the build system is unaffected as the ICU libraries are
loaded only at runtime. This is intentional though there is a performance
cost associated with searching for the library. This is acceptable
as it will only be incurred on use of the icu
command.
Code is in branch apn-experiment-chardet.