Author: Jan Nijtmans <[email protected]>
State: Draft
Type: Project
Created: 30-Jul-2021
Post-History:
Keywords: Tcl
Tcl-Version: 8.7
Tcl-Branch: tip607-encoding-failindex
Abstract
This TIP proposes to add a "-failindex" option to encoding convertto/convertfrom. The implementation brings TIP [601] to the script level. In case of untransformable data, the error location and the so far transformed string may be retrieved.
-failindex option
The command is extended by a -failindex option:
encoding convertfrom ?-failindex posvar? ?encoding? data
encoding convertto ?-failindex posvar? ?encoding? data
The distinct behaviour is as follows:
- No conversion error
- Option -failindex not given: Converted data returned as command result.
- Option -failindex given: Additionaly, the value -1 is written to the given variable in the caller scope.
- Conversion error present
- Option -failindex not given: In TCL 8.7 or with -nocomplain option, the data is converted with replacement characters as currently done. Otherwise, an error message is thrown by the command (Error Code: EILSEQ).
- Option -failindex given: The converted data until the failed index is returned as command result. The position of the conversion error in the source string is written to the specified variable in the caller scope.
This specification is inspired by the already present option -failindex of the string is command.
This option may not be used together with the TCL encoding option **-nocomplain'' of TIP601.
#discussion
Implementation of byte compiled encoding commands
Jan mentioned, that the implementation is not trivial, as the encoding ensemble is a partially compiled command.
Nevertheless, an implementation is tried in the branch tip601-encoding-failindex
.
The command string is -failindex var is also not byte compiled.
I plan to prefix the current generic byte compiler by a test for a present -failindex option, and not do byte compiling in this case.
incomplete multi byte sequences.
Note: there was a side discussion within the thread if an incomplete multi-byte sequence is an error or not. Unfortunately, the required detail about the reporting method of an incomplete multi-byte sequence was not solved. So, it is considered as an error within this alternate solution.
#credits
The proposal was initiated by a post by Andreas Leitgeb 2021-05-12 on the core list.
Copyright
This document has been placed in the public domain.