TIP 345: Kill the 'identity' Encoding

Login
Author:         Alexandre Ferrieux <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        05-Feb-2009
Post-History:
Discussions-To: Tcl Core List
Keywords:       Tcl,encoding,invalid UTF-8
Tcl-Version:    8.7
Tcl-Ticket:     2564363

Abstract

This TIP proposes to remove the 'identity' encoding which is the Pandora's Box of invalid UTF-8 string representations.

Background

The contract of string representations in Tcl states that the bytes field (the strep) of a Tcl_Obj must be a valid UTF-8 byte sequence. Violating it leads at best to inconsistent and shimmer-sensitive string comparisons. Fortunately, nearly all of the Tcl code takes careful steps to enforce it. With one exception: the 'identity' encoding. Indeed, this encoding allows any byte sequence to be copied verbatim into the strep of a value, as a side-effect of a strep computation on a ByteArray with [encoding system]=="identity", or through [encoding convertfrom identity]. Hence an invalid UTF-8 sequence can easily make it to the strep and start wreaking havoc.

Proposed Change

This TIP proposes to simply close that single window to the dark side.

Rationale

The risk of compatibility breakage is inordinately mild in that case, since it has only ever been documented in tcltest.

Reference Example

See Bug 2564363 https://sourceforge.net/support/tracker.php?aid=2564363

Copyright

This document has been placed in the public domain.