TIP 653: Handling encoding errors for [read] and [gets]

Bounty program for improvements to Tcl and certain Tcl packages.
Author:		Nathan Coulter <[email protected]>
State:		Draft
Type:		Project
Vote:		Pending
Created:	08-Jan-2023
Tcl-Version:	8.7
Tcl-branch:	py-b8f575aa23


In recent versions of Tcl it is possible to configure a channel to treat data that does not conform to the encoding specification as an error. [read] and [gets] must pass along such an error. In order to maintain expected semantics and to maximize utility, [read] and [gets] can use the return options dictionary to communicate additional information when an error occurs.


When [read] and [gets] must return an encoding error, the error is returned when it is encountered, not on some subsequent call, and the data successfully decoded up to the point of the error is available in the return options dictionary under the path -result read. This behaviour is independent of and orthogonal to the blocking configuration of the channel.

After such an error, the current access position in the channel is the position of the first byte of the data that caused the error. [tell] provides that position.


The primary intent is to preserve current semantics: An error occurs immediately when non-conforming data is encountered, not on the next call to [read] or [gets], as was proposed in some other approaches. The second goal is to make the position of the non-conforming data available to the caller. One natural way to do this is to make it the current position so that [tell] can provide it. The question then arises: What to do with the data that has been successfully decoded so far? The most simple and probably best answer is to make it available to the caller in case something useful can be done with it.

In Tcl the return value in case of an error is normally an error message, so the return value is not available for passing to the caller other information related the error. -errorcode could be used, but it is typically used for classification of the error, and mixing in other types of additional information does not seem like a particularly good idea.

The data successfully decoded so far is stored under the path -result read rather than just -result so that if there later arises a need to return other information, it can be assigned to another key under -result. -result could become a common pattern for returning rich data in exceptional cases.

Under this proposal the caller of [read] and [gets] can handle each occurence of non-conforming data and then continue to read data from the channel.


The py-b8f575aa23 branch contains a complete implementation under which the entire test suite passes.


This document has been placed in the public domain.