Author: Jan Nijtmans <[email protected]>
State: Draft
Type: Project
Vote: Pending
Tcl-Version: 9.0
Tcl-Branch: tip-657
Abstract
This TIP proposes to make "-profile strict" the default. This was previously (but with a different approach) proposed in TIP #601, but the implementation didn't match the TIP text. This TIP is intended as replacement for TIP #601, but builds on top of TIP #656 ("A revised proposal for encodings")
An important part missing from TIP #601 is the Compatibility section, which should have been much more clear about the implications of the change.
Rationale
The tcl8
profile is a legacy profile, which doesn't conform
to any recommended behavior, the two other profiles strict
and
replace
do.
Since strict
is the most desired profile, it becomes the default
in Tcl 9.0. That has some implications at the script level and also
in the C API. Many "http" testcases fail (without further measures),
because they depend on the "tcl8" profile never throwing exceptions.
Many scripts will have to be adapted, either expecting exceptions
for encoding errors or setting the channel profile to "tcl8" or
"replace". And functions like "fcopy", "read" and "gets" will
throw exceptions in more situations than before.
Specification
Passing the TCL_ENCODING_STOPONERROR
flag to Tcl_ExternalToUtfDStringEx()
,
Tcl_UtfToExternalDStringEx()
, Tcl_ExternalToUtf()
and
Tcl_UtfToExternal()
, causes these functions to report any encoding error that
occurs (in the C API represented as the EILSEQ
POSIX error). In Tcl 9.0, the
behaviour indicted by the flag TCL_ENCODING_STOPONERROR
becomes the default,
and the flags TCL_ENCODING_PROFILE_TCL8
TCL_ENCODING_PROFILE_REPLACE
both
prevent prevent any exceptions from being thrown.
A new function, Tcl_InputEncodingError()
, may be used instead of checking for
the EILSEQ
posix error. If Tcl_InputEncodingError()
returns 1, then the
current position of the channel is the position of where an encoding error
occurred, and any follow-up 'read' (or 'gets') would return no data but result
in a EILSEQ
POSIX error. This condition can be reset by changing the
'-encoding' or the '-profile' of the channel.
Backporting to Tcl 8.7
The function Tcl_InputEncodingError()
will be backported to Tcl 8.7. It's
only useful if the channel is set to "-profile strict".
Also, in Tcl 8.7, the "-failindex" option will be changed to work the same as in Tcl 9.0: If "-failindex" is specified, but "-profile" is not specified in the "encoding convertfrom/convertto" command, then the "strict" profile will be assumed.
Compatibility
This is an incompatible change for Tcl_ExternalToUtf()
/Tcl_UtfToExternal()
. But
since those functions are rarely used (and when they are used, they often already
have the TCL_ENCODING_STOPONERROR flag set already), it will have little effect.
This is also an incompatible change for Tcl_Read()
, Tcl_Write()
, Tcl_Gets()
.
For channels which have the (default) strict
profile, they can now return
a POSIX error EILSEQ
when an encoding error occurs. For maximum compatibility
with current behavior, a distinction is made for 'blocking' resp. 'non-blocking' mode.
In 'blocking' mode, the functions Tcl_Read()
/Tcl_ReadObj()
and
Tcl_Gets()
/Tcl_GetsObj()
set the POSIX error EILSEQ
whenever an encoding
error occurs. They also return the data as received so far, and the file pointer
will be left where the encoding error occurred. If there was left-over data,
received before encountering the encoding error, this data will be left
in the "-data" return option.
In 'non-blocking' mode, if there is any data returned before the encoding
error, the POSIX error will not be set yet, so the channel has a chance to
handle the data so far normally. Next call to
Tcl_Read()
/Tcl_ReadObj()
/Tcl_Gets()
/Tcl_GetsObj()
(which normally happens
in a loop or as a readable
event) will return no data but only the POSIX error EILSEQ
.
The functions Tcl_Write()
/Tcl_WriteObj()
and Tcl_Eof()
don't depend on blocking
mode. Tcl_Write()
will always write out as many characters it can, and always sets
POSIX error EILSEQ
when it cannot write more due to an encoding error. Tcl_Eof()
will only return true when the channel is at an EOF condition, it will return false
when the channel is at an encoding error position.
The 'http' package is modified because of this change: Since the 'http' package is not prepared to handle exceptions, it can easily be left in an inconsistent state, as shown by test-case errors when the default profile was changed to 'strict'. Therefore, the 'http' package, when run in Tcl 9.0, will use the 'replace' profile. This makes the package conformant to the W3C recommendations.
The 'tcltest' package is modified to use the 'tcl8' profile for its internal channels. For this package, we don't want exceptions to disturb test-outputs. If a test-case wants to handle a surrogate, so be it, this should not disturb the testcase.
Copyright
This document has been placed in the public domain.