Author: Alexandre Ferrieux <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 05-Feb-2009
Post-History:
Discussions-To: Tcl Core List
Keywords: Tcl,encoding,invalid UTF-8
Tcl-Version: 8.7
Tcl-Ticket: 2564363
Abstract
This TIP proposes to remove the 'identity' encoding which is the Pandora's Box of invalid UTF-8 string representations.
Background
The contract of string representations in Tcl states that the bytes field (the strep) of a Tcl_Obj must be a valid UTF-8 byte sequence. Violating it leads at best to inconsistent and shimmer-sensitive string comparisons. Fortunately, nearly all of the Tcl code takes careful steps to enforce it. With one exception: the 'identity' encoding. Indeed, this encoding allows any byte sequence to be copied verbatim into the strep of a value, as a side-effect of a strep computation on a ByteArray with [encoding system]=="identity", or through [encoding convertfrom identity]. Hence an invalid UTF-8 sequence can easily make it to the strep and start wreaking havoc.
Proposed Change
This TIP proposes to simply close that single window to the dark side.
Rationale
The risk of compatibility breakage is inordinately mild in that case, since it has only ever been documented in tcltest.
Reference Example
See Bug 2564363 https://sourceforge.net/support/tracker.php?aid=2564363
Copyright
This document has been placed in the public domain.