Artifact af9e3e581ebc55f10ab8ec1c8244f35e4e98ee51333a5c2f60df3af31fc3df90:
- File tip/346.md — part of check-in [9097468063] at 2022-08-26 13:22:42 on branch main — Move TIP #346 from 8.7 to 9.0, since it is related to TIP #568 (which is in 9.0 now) (user: jan.nijtmans size: 2493)
TIP 346: Error on Failed String Encodings
Author: Alexandre Ferrieux <[email protected]>
State: Draft
Type: Project
Vote: Pending
Created: 02-Feb-2009
Post-History:
Keywords: Tcl,encoding,convertto,strict,Unicode,String,ByteArray
Tcl-Version: 9.0
Abstract
This TIP proposes to raise an error when an encoding-based conversion loses information.
Background
Encoding-based conversions occur e.g. when writing a string to a
channel. In doing so, Unicode characters are converted to sequences of bytes
according to the channel's encoding. Similarly, a conversion can occur
on request of the ByteArray internal representation of an object, the target
encoding being ISO8859-1. In both cases, for some
combinations of Unicode char and target encoding, the mapping is lossy
(non-injective). For example, the "é
" character, and many of its
cousins, is mapped to a "?
" in the 'ascii' target encoding. Also, Unicode chars above \u00FF get 'projected' onto their low byte in the ISO8859-1 ByteArray conversion.
This loss of information, in the first case, introduces unnoticed i18n mishandlings. In the second case, it makes it unreliable to do pure-ByteArray operations on objects unless they have no string representation. This induces unwanted and hard-to-debug performance hits on bytearray manipulations when people add debugging puts.
Proposed Change
This TIP proposes to make this loss conspicuous.
For the first use case, the idea is to introduce a -strict option to encoding convertto, that would raise an explicit error when non-mappable characters are met. Lossy conversions during channel I/O would also fail if a -strictencoding true [fconfigure] option is set. For the second case, we simply want the conversion to fail, like does the Listification of an ill-formed list. In both cases, the change consists of letting the proper internal conversion routine like SetByteArrayFromAny return TCL_ERROR.
Rationale
The second case does imply a Potential Incompatibility, since currently SBFA is documented to always return TCL_OK. However, it is felt that virtually all cases that are sensitive to this, are actually half-working in a completely hidden manner. Hence the global effect is a healthy one.
Reference Example
See Bug 1665628.
Copyright
This document has been placed in the public domain.