Author: Jan Nijtmans <[email protected]>
Author: Nathan Coulter <[email protected]>
State: Final
Type: Project
Vote: Done
Tcl-Version: 9.0
Tcl-Branch: tip-657
Vote-Summary: Accepted 6/0/1
Votes-For: AF, AK, JN, KW, MC, SL
Votes-Against: none
Votes-Present: DKF
Abstract
This TIP proposes to make "-profile strict" the default. This TIP is intended as replacement for TIP #601, but builds on top of TIP #656 ("A revised proposal for encodings")
Rationale
The tcl8
profile is a legacy profile, which doesn't conform
to any recommended behavior, the two other profiles strict
and
replace
do.
Since strict
is the recommended profile in most situations, it becomes
the default in Tcl 9.0, with a few exceptions. That has some implications
at the script level.
Many scripts will have to be adapted, either expecting exceptions for encoding errors or setting the channel profile to "tcl8" or "replace". And functions like "fcopy", "read" and "gets" now will throw an exception when encountering encoding-errors, which might not be expected by external applications/extensions.
Specification
New channels are by default assigned the strict
profile, and both
encoding convertfrom
and encoding convertto
use the strict
profile
by default. The exception for this is the stderr
channel, which
will default to the replace
profile.
Tcl_FSEvalFileEx()
uses the strict
profile, and therefore source
uses
the strict profile. All commands except glob
use the strict
profile.
Tcl_ExternalToUtfDStringEx()
, Tcl_UtfToExternalDStringEx()
,
Tcl_ExternalToUtf()
and Tcl_UtfToExternal()
, support operation in a mode
where any encoding error that occurs results in an EILSEQ
POSIX error. That
mode is now the default. Other modes can be explicitly configured by the
caller (TIP #656) to specify how these functions behave when invalid data are encountered.
Handling of environment variables (syncing between the ::env array and the
native environment) is still using the tcl8
profile, as well as the
glob
command. The reason for this is that in those situations many
applications won't expect exceptions when illegal byte-sequences
happen in (disk-)filenames or in environment variables. That's why
it's out-of-scope for this TIP. TIP #671 is an attempt
to solve this problem with environment variables and the glob
command.
Compatibility
Since this is an incompatible change whenever channels/files/sockets are used, it has a potential big effect on extensions. All extensions which could be confronted with encoding errors now have to handle the possibility of exceptions to be thrown in the case of encoding errors.
Also, when trying to open a file, when the filename has surrogate characters in it (or .. any code-point missing from the system encoding), opening such file will fail in Tcl 9.0, while it might have succeeded in Tcl 8.x. e.g.:
set f [open \U1F91D w] close $f set f [open \uD83E\uDD1D r]This will succeed in Tcl 8.7, but fail in Tcl 9.0, because surrogate pairs are not equal to the combined character any more.
The 'http' package is modified because of this change: Since the 'http' package is not prepared to handle exceptions, it can easily be left in an inconsistent state, as shown by test-case errors when the default profile was changed to 'strict'. Therefore, the 'http' package, when run in Tcl 9.0, will use the 'replace' profile. This makes the package conformant to the W3C recommendations.
The 'tcltest' package is modified to use the 'tcl8' profile for its internal channels. For this package, we don't want exceptions to disturb test-outputs. If a test-case wants to handle a surrogate, so be it, this should not disturb the testcase.
Implementation
Implementation is available in the tip-657 branch of the Tcl repository.
Copyright
This document has been placed in the public domain.