Artifact [af9e3e581e]

Login

Artifact af9e3e581ebc55f10ab8ec1c8244f35e4e98ee51333a5c2f60df3af31fc3df90:


# TIP 346: Error on Failed String Encodings
	Author:         Alexandre Ferrieux <[email protected]>
	State:          Draft
	Type:           Project
	Vote:           Pending
	Created:        02-Feb-2009
	Post-History:   
	Keywords:       Tcl,encoding,convertto,strict,Unicode,String,ByteArray
	Tcl-Version:    9.0
-----

# Abstract

This TIP proposes to raise an error when an encoding-based conversion
loses information.

# Background

Encoding-based conversions occur e.g. when writing a string to a
channel. In doing so, Unicode characters are converted to sequences of bytes
according to the channel's encoding. Similarly, a conversion can occur
on request of the ByteArray internal representation of an object, the target
encoding being ISO8859-1. In both cases, for some
combinations of Unicode char and target encoding, the mapping is lossy
\(non-injective\). For example, the "`é`" character, and many of its
cousins, is mapped to a "`?`" in the '**ascii**' target encoding. Also, Unicode chars above \\u00FF get 'projected' onto their low byte in the ISO8859-1 ByteArray conversion.

This loss of information, in the first case, introduces unnoticed i18n
mishandlings. In the second case, it makes it unreliable to do pure-ByteArray
operations on objects unless they have no string representation. This induces
unwanted and hard-to-debug performance hits on bytearray manipulations when
people add debugging **puts**.

# Proposed Change

This TIP proposes to make this loss conspicuous.

For the first use case, the idea is to introduce a **-strict** option to
**encoding convertto**, that would raise an explicit error when non-mappable
characters are met. Lossy conversions during channel I/O would also fail if a **-strictencoding true** [fconfigure] option is set.
  For the second case, we simply want the conversion to
fail, like does the Listification of an ill-formed list. In both cases, the
change consists of letting the proper internal conversion routine like **SetByteArrayFromAny** return TCL\_ERROR.

# Rationale

The second case does imply a Potential Incompatibility, since currently SBFA is documented to always return TCL\_OK. However, it is felt
that virtually all cases that are sensitive to this, are actually half-working
in a completely hidden manner. Hence the global effect is a healthy one.

# Reference Example

See [Bug 1665628](https://core.tcl-lang.org/tcl/tktview/1665628).

# Copyright

This document has been placed in the public domain.