Author: Nathan Coulter <[email protected]>
State: Draft
Type: Project
Vote: Pending
Created: 08-Jan-2023
Tcl-Version: 8.7
Tcl-branch: tip-653
Vote-Summary:
Votes-For:
Votes-Against:
Votes-Present:
Abstract
In recent versions of Tcl it is possible to configure a channel to treat data
that does not conform to the encoding specification as an error. read
and
gets
must pass along such an error. In order to maintain expected
semantics for a blocking channel, and to maximize utility, read
and
gets
can use the return options dictionary to communicate additional
information when an error occurs.
Specification
When read
and gets
must return an encoding error on a blocking channel, the
error is returned when it is encountered, not on some subsequent call, and the
data successfully decoded up to the point of the error is available in the
return options dictionary either under the path -result read
, or under the
path -data
, (to be determined). The advantage of -result read
is that
additional information could be added under the same key. For example,
-result bytes
might contain the original bytes prior to decoding.
After such an error, the current access position in the channel is the position
of the first byte of the data that caused the error. [tell]
provides that
position.
Channels which use the (default) strict
profile, now return the POSIX error
EILSEQ
when an encoding error occurs. For maximum compatibility with current
behavior, a distinction is made for 'blocking' resp. 'non-blocking' mode.
In 'blocking' mode, the functions Tcl_Read()
/Tcl_ReadObj()
and
Tcl_Gets()
/Tcl_GetsObj()
set the POSIX error EILSEQ
whenever an encoding
error occurs. If Tcl_Gets()
/Tcl_GetsObj()
encounter an encoding error, the
file-pointer is left at the original position, and the functions return -1.
Tcl_Read()
/Tcl_ReadObj()
store the data as received so far in the return
options dictionary, and the file pointer is left where the encoding error
occurred.
In 'non-blocking' mode, all data prior to the first byte that resulted in an
encoding error is returned, and the POSIX error is not yet set. On the the next
call to Tcl_Read()
/Tcl_ReadObj()
, which normally happens in a loop or as a
readable
event, no data is returned and the POSIX error EILSEQ
is set.
This makes it possible to handle all data up to the point of the error
normally.
The functions Tcl_Write()
/Tcl_WriteObj()
and Tcl_Eof()
don't depend on
blocking
mode. Tcl_Write()
always writes out as many characters it can, and
always sets POSIX error EILSEQ
when it cannot write more due to an encoding
error. Tcl_Eof()
only returns true when the channel is at an EOF condition,
not when the channel is at an encoding error position.
Rationale
The primary intent is to preserve current semantics of read
and gets
for a blocking channel: An error occurs immediately when non-conforming data
is encountered, not on the next call to read
or gets
, as was proposed
in some other approaches. The second goal is to make the position of the
non-conforming data available to the caller. One natural way to do this is to
make it the current position so that [tell]
can provide it. The question
then arises: What to do with the data that has been successfully decoded so
far? The most simple and probably best answer is to make it available to the
caller in case something useful can be done with it.
In Tcl the return value in case of an error is normally an error message, so
the return value is not available for passing to the caller other information
related the error. -errorcode
could be used, but it is typically used for
classification of the error, and mixing in other types of additional
information does not seem like a particularly good idea.
The data successfully decoded so far is stored under the path -result read
rather than just -result
so that if there later arises a need to return other
information, it can be assigned to another key under -result
. For example,
one idea is that the original undecoded bytes should also be returned.
-result
could become a common pattern for returning rich data in exceptional
cases.
Under this proposal the caller of read
and gets
can handle each
occurence of non-conforming data and then continue to read data from the
channel.
Implementation
The py-b8f575aa23 branch contains a complete implementation under which the entire test suite passes.
Copyright
Copyright © 2023, Nathan Coulter. All rights reserved.
Support
The author of this TIP requests financial support for this and other free software work. Contact and payment information available at:
https://wiki.tcl-lang.org/page/Poor+Yorick