Author: Jan Nijtmans <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 2024-06-24
Tcl-Version: 9.0
Tcl-Branch: tip-699
Vote-Summary: Accepted 5/0/3
Votes-For: AN, HO, JN, KW, MC
Votes-Against: None
Votes-Present: BG, SL, RA
Abstract
This TIP proposes to remove the "binary" and {} encoding as alias for "iso8859-1" in Tcl 9.0.
Also a new "chan isbinary" command is implemented, which returns 1 when the channel is a binary channel, and 0 when it is not. This "chan isbinary" command will be backported to 8.7, all other parts of this TIP will not.
Rationale
In Tcl 8.x, the -encoding of a channel can be set to {} or "binary". This encoding can be used to send a byte through a channel unchanged. For example:
$ tclsh8.6 % chan configure stdout -encoding binary % puts \u0161 a %Note that byte arrays are not really sent through unchanged: If a code point is > 0xFF, only the lower 8 bits are sent, the higher bits are silently stripped off.
Another possibility in Tcl 8.6 is to use the "iso8859-1" encoding. It's basically the same encoding as "binary", but behaves slightly different for code points > 0xFF:
$ tclsh8.6 % chan configure stdout -encoding iso8859-1 % puts \u0161 ? %In stead of silently stripping the higher bits, the "iso8859-1" encoding replaces any unknown code point with
?
.
In Tcl 9.0, this changed:
$ tclsh9.0 % chan configure stdout -encoding {} $ chan configure stdout -encoding iso8859-1 % chan configure stdout -encoding binary % chan configure stdout -encoding iso8859-1 % puts \u0161 error writing "stdout": invalid or incomplete multibyte or wide character
We see that the {} and "binary" encodings became an alias for
"iso8859-1", which now throws an exception for any code point > 0xFF.
If you want back the behavior like the "iso8859-1" encoding in Tcl
8.6, use the tcl8
profile. You cannot get back the behavior for
the "binary" encoding as it was in Tcl 8.6 (TIP #568).
A lot of extensions mis-use the "binary" encoding. Some examples:
nsf (nx-zip.tcl:250):
fconfigure $fdata -encoding binary -translation binaryRemark: "-encoding binary" is useless, since -translation binary already does the same.
tcllib (e.g. base64/ascii85.tcl:157)
fconfigure $fd -encoding binary -translation binaryRemark: same
practcl (practcl.tcl:7688)
fconfigure $fh -encoding binary -translation lf -eofchar {} Remark: this is the long form for "-translation binary"
tdom (tdom.tcl:757)
fconfigure $fd -encoding binaryRemark: Here it is used only for reading the BOM, so will work fine. "-encoding iso8859-1" would make it more clear that only the encoding changed, not -eofchar or -translation
trf (md:98)
fconfigure stdout -translation binary catch {fconfigure stdout -encoding binary}Remark: need I say more????
twapi (tls.test:1448)
chan configure $so .... -encoding binary -eofchar X -translation binaryRemark: the "-translation binary" will set the "-encoding" to "iso8859-1", "-eofchar" to {} and "-translation" to "lf"
Specification
The {} and "binary" encodings will be removed in Tcl 9.0. If you want to have a binary channel, there are only 2 possibilities:
- Use the "b" option from the "open" command.
- Use "-translation binary", which is still shorthand for "-encoding iso8859-1 -eofchar {} -translation lf"
Also a new command "chan isbinary <channel>" is added, which returns 1 if the channel is a binary channel, 0 when it is not.
Compatibility
Many extensions use "-encoding binary" when they really meant "-translation binary". Those extensions will need to be fixed for Tcl 9.0. Examples can be found in the above Rationale, but there are many more.
Extensions still using "-encoding {}" or "-encoding binary" will see an error-message:
$ tclsh9.0 % chan configure stdout -encoding {} unknown encoding "": No longer supported. please use either "-translation binary" or "-encoding iso8859-1" % chan configure stdout -encoding binary unknown encoding "binary": No longer supported. please use either "-translation binary" or "-encoding iso8859-1" %
Implementation
Implementation is in Tcl branch "tip-699".
Copyright
This document has been placed in the public domain.