Author: Jan Nijtmans <[email protected]>
Author: Harald Oehlmann <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 08-Jan-2023
Tcl-Version: 8.7
Tcl-branch: tip-653
Vote-Summary: 3/0/0
Votes-For: AK, JN, SL
Votes-Against: none
Votes-Present: none
Abstract
With the introduction of the strict
profile for channels it is
possible that read
throws an exception and consumes correctly
decoded data from the channel.
Currently, that can only happen in blocking channels in strict
profile, when an encoding error occurs.
But in the future there might be more situations and more commands
(gets
) where this can happen.
In order to prevent data loss, this TIP proposes to return this consumed data in the return dictionary of the exception.
Specification
Whenever a channel command throws an exception (EILSEQ) due to an
encoding error, the data successfully decoded up to the point of
the error may be lost, as the error case does not allow to return a
value.
In the case of this potential data loss, the so far decoded data
is made available in the error options dictionary under the key
-data
.
The key -data
is present and has an empty string as value, if the
encoding error happens at the beginning of the data.
The purpose of this rule is to make user scripts easier.
The key -data
is not present, if the already decoded data is not consumed,
e.g. the next call to a function will consider it again.
Rationale
The special point with encoding errors is, that they may be introspected or corrected by choosing an encoding or encoding profile, which handles the data correctly.
To enable this, the so far decoded data may be of value.
Here is an example reading the remaining data using the binary encoding:
Create test file
The test file contains an encoding error in UTF-8 encoding at the 2nd byte. It consists of the following bytes:
- A capital "A"
- "\xC3" :this announces a multi-byte UTF-8 sequence
- A capital "B": the announced multi-byte sequence is not continued resulting in an UTF-8 encoding error at position 1.
% set f [open test_A_195_B.txt wb]; puts -nonewline $f A\xC3B; close $f
Read until encoding error
Now read the file using utf-8 encoding and get informed about the error by an error dictionary:
% set f [open test_A_195_B.txt r]
file35a65a0
% fconfigure $f -encoding utf-8 -profile strict -blocking 1
% catch {read $f} e d
1
% set d
-data A -code 1 -level 0
-errorstack {INNER {invokeStk1 read file35a65a0}}
-errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}}
-errorinfo {...} -errorline 1
% tell $f
1
The error code POSIX EILSEQ
informs about an incoding error.
The proposed key -data
is present and contains the file data until the encoding error.
This is a capital "A".
The call to tell
informs about the encoding error position.
The file position was advanced just before the encoding error.
The data before it was consumed and will not be returned by a second call to read
.
Note, that within a file, we may jump back and read again.
This is not possible for other channels like a socket.
We only know, that, after an encoding error, we are located at the
encoding error position and any so far decoded data is in the -data
key value.
Handle encoding error
To handle the encoding error, the remaining data is read by changing the encoding to
binary. The next read
command reads the bytes with the encoding error in UTF-8 encoding.
% fconfigure $f -encoding binary -profile strict
% read $f
ÃB
The application may now decide any action by introspection of the data.
Example for Gets
Gets does not advance the current file position on encoding errors.
In consequence, the already decoded data should not be returned and the key
-data
should not exist.
Here is an example with the same file and the same error recovery:
% set f [open test_A_195_B.txt r]
file35a65a0
% fconfigure $f -encoding utf-8 -profile strict -blocking 1
% catch {gets $f} e d
1
% set d
-code 1 -level 0
-errorstack {INNER {invokeStk1 gets file384b6a8}}
-errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}}
-errorinfo {...} -errorline 1
% tell $f
0
% fconfigure $f -encoding binary -profile strict
% gets $f
AÃB
Note the differences to read
:
- The current file position is not advanced just before the encoding error.
- The recovery
gets
returns the well encoded data "A" and the error data.
Implementation
The tip-653 branch contains a complete implementation under which the entire test suite passes.
Copyright
This document has been placed in the public domain.