TIP 699: Eliminate encoding alias "binary"; provide introspection for binary channels.

Login
Author:		Jan Nijtmans <[email protected]>
State:		Final
Type:		Project
Vote:		Done
Created:	2024-06-24
Tcl-Version:	9.0
Tcl-Branch:	tip-699
Vote-Summary:   Accepted 5/0/3
Votes-For:      AN, HO, JN, KW, MC
Votes-Against:  None
Votes-Present:  BG, SL, RA

Abstract

This TIP proposes to remove the "binary" and {} encoding as alias for "iso8859-1" in Tcl 9.0.

Also a new "chan isbinary" command is implemented, which returns 1 when the channel is a binary channel, and 0 when it is not. This "chan isbinary" command will be backported to 8.7, all other parts of this TIP will not.

Rationale

In Tcl 8.x, the -encoding of a channel can be set to {} or "binary". This encoding can be used to send a byte through a channel unchanged. For example:

$ tclsh8.6
% chan configure stdout -encoding binary
% puts \u0161
a
%
Note that byte arrays are not really sent through unchanged: If a code point is > 0xFF, only the lower 8 bits are sent, the higher bits are silently stripped off.

Another possibility in Tcl 8.6 is to use the "iso8859-1" encoding. It's basically the same encoding as "binary", but behaves slightly different for code points > 0xFF:

$ tclsh8.6
% chan configure stdout -encoding iso8859-1
% puts \u0161
?
%
In stead of silently stripping the higher bits, the "iso8859-1" encoding replaces any unknown code point with ?.

In Tcl 9.0, this changed:

$ tclsh9.0
% chan configure stdout -encoding {}
$ chan configure stdout -encoding
iso8859-1
% chan configure stdout -encoding binary
% chan configure stdout -encoding
iso8859-1
% puts \u0161
error writing "stdout": invalid or incomplete multibyte or wide character

We see that the {} and "binary" encodings became an alias for "iso8859-1", which now throws an exception for any code point > 0xFF. If you want back the behavior like the "iso8859-1" encoding in Tcl 8.6, use the tcl8 profile. You cannot get back the behavior for the "binary" encoding as it was in Tcl 8.6 (TIP #568).

A lot of extensions mis-use the "binary" encoding. Some examples:

nsf (nx-zip.tcl:250):

    fconfigure $fdata -encoding binary -translation binary
Remark: "-encoding binary" is useless, since -translation binary already does the same.

tcllib (e.g. base64/ascii85.tcl:157)

    fconfigure $fd -encoding binary -translation binary
Remark: same

practcl (practcl.tcl:7688)

    fconfigure $fh -encoding binary -translation lf -eofchar {}
    Remark: this is the long form for "-translation binary"

tdom (tdom.tcl:757)

    fconfigure $fd -encoding binary
Remark: Here it is used only for reading the BOM, so will work fine. "-encoding iso8859-1" would make it more clear that only the encoding changed, not -eofchar or -translation

trf (md:98)

    fconfigure stdout -translation binary
    catch {fconfigure stdout -encoding binary}
Remark: need I say more????

twapi (tls.test:1448)

    chan configure $so .... -encoding binary -eofchar X -translation binary
Remark: the "-translation binary" will set the "-encoding" to "iso8859-1", "-eofchar" to {} and "-translation" to "lf"

Specification

The {} and "binary" encodings will be removed in Tcl 9.0. If you want to have a binary channel, there are only 2 possibilities:

Also a new command "chan isbinary <channel>" is added, which returns 1 if the channel is a binary channel, 0 when it is not.

Compatibility

Many extensions use "-encoding binary" when they really meant "-translation binary". Those extensions will need to be fixed for Tcl 9.0. Examples can be found in the above Rationale, but there are many more.

Extensions still using "-encoding {}" or "-encoding binary" will see an error-message:

$ tclsh9.0
% chan configure stdout -encoding {}
unknown encoding "": No longer supported.
        please use either "-translation binary" or "-encoding iso8859-1"
% chan configure stdout -encoding binary
unknown encoding "binary": No longer supported.
        please use either "-translation binary" or "-encoding iso8859-1"
%

Implementation

Implementation is in Tcl branch "tip-699".

Copyright

This document has been placed in the public domain.