TIP 653: Handle consumed data of channel commands in case of encoding errors

Login
Author:		Jan Nijtmans <[email protected]>
Author:		Harald Oehlmann <[email protected]>
State:		Final
Type:		Project
Vote:		Done
Created:	08-Jan-2023
Tcl-Version:	8.7
Tcl-branch:	tip-653
Vote-Summary:	3/0/0
Votes-For:	AK, JN, SL
Votes-Against:	none
Votes-Present:	none

Abstract

With the introduction of the strict profile for channels it is possible that read throws an exception and consumes correctly decoded data from the channel. Currently, that can only happen in blocking channels in strict profile, when an encoding error occurs. But in the future there might be more situations and more commands (gets) where this can happen.

In order to prevent data loss, this TIP proposes to return this consumed data in the return dictionary of the exception.

Specification

Whenever a channel command throws an exception (EILSEQ) due to an encoding error, the data successfully decoded up to the point of the error may be lost, as the error case does not allow to return a value. In the case of this potential data loss, the so far decoded data is made available in the error options dictionary under the key -data.

The key -data is present and has an empty string as value, if the encoding error happens at the beginning of the data. The purpose of this rule is to make user scripts easier.

The key -datais not present, if the already decoded data is not consumed, e.g. the next call to a function will consider it again.

Rationale

The special point with encoding errors is, that they may be introspected or corrected by choosing an encoding or encoding profile, which handles the data correctly.

To enable this, the so far decoded data may be of value.

Here is an example reading the remaining data using the binary encoding:

Create test file

The test file contains an encoding error in UTF-8 encoding at the 2nd byte. It consists of the following bytes:

% set f [open test_A_195_B.txt wb]; puts -nonewline $f A\xC3B; close $f

Read until encoding error

Now read the file using utf-8 encoding and get informed about the error by an error dictionary:

% set f [open test_A_195_B.txt r] file35a65a0 % fconfigure $f -encoding utf-8 -profile strict -blocking 1 % catch {read $f} e d 1 % set d -data A -code 1 -level 0 -errorstack {INNER {invokeStk1 read file35a65a0}} -errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}} -errorinfo {...} -errorline 1 % tell $f 1

The error code POSIX EILSEQ informs about an incoding error.

The proposed key -data is present and contains the file data until the encoding error. This is a capital "A".

The call to tell informs about the encoding error position. The file position was advanced just before the encoding error. The data before it was consumed and will not be returned by a second call to read.

Note, that within a file, we may jump back and read again. This is not possible for other channels like a socket. We only know, that, after an encoding error, we are located at the encoding error position and any so far decoded data is in the -data key value.

Handle encoding error

To handle the encoding error, the remaining data is read by changing the encoding to binary. The next read command reads the bytes with the encoding error in UTF-8 encoding.

% fconfigure $f -encoding binary -profile strict % read $f ÃB

The application may now decide any action by introspection of the data.

Example for Gets

Gets does not advance the current file position on encoding errors. In consequence, the already decoded data should not be returned and the key -data should not exist.

Here is an example with the same file and the same error recovery:

% set f [open test_A_195_B.txt r] file35a65a0 % fconfigure $f -encoding utf-8 -profile strict -blocking 1 % catch {gets $f} e d 1 % set d -code 1 -level 0 -errorstack {INNER {invokeStk1 gets file384b6a8}} -errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}} -errorinfo {...} -errorline 1 % tell $f 0 % fconfigure $f -encoding binary -profile strict % gets $f AÃB

Note the differences to read:

Implementation

The tip-653 branch contains a complete implementation under which the entire test suite passes.

Copyright

This document has been placed in the public domain.