Tcl Source Code

View Ticket
Login
EuroTcl/OpenACS 11 - 12 JULY 2024, VIENNA
Ticket UUID: 8cae59ecb03a6269468379a1e028c01f604ce5de
Title: Eliminate "-encoding binary" in favour of "-translation binary"
Type: Bug Version: 9.0
Submitter: jan.nijtmans Created on: 2024-06-20 07:57:20
Subsystem: - New Builtin Commands Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Minor
Status: Open Last Modified: 2024-06-21 08:15:24
Resolution: None Closed By: nobody
    Closed on:
Description:

There's a difference between "-encoding binary" and "-translation binary". The first one is simply an alias for "-encoding iso8859-1". The second one also sets -eofchar to {} and -transation to lf. This is a cause for confusion, even though it is clearly documented: Many people use "-encoding binary" while they really mean "-translation binary". An example is [this] ticket

So, should we not simply remove "-encoding binary", or - even better - let it generate an error-message saying: "Please use -translation binary".

User Comments: jan.nijtmans added on 2024-06-21 08:15:24:

I see the discussion going into two directions. One is about the [introspection], one is about "binary" being an alias for "iso8859-1".

I just discovered, "iso8859-1" has two aliases already: the empty string and "binary". It was already like that in Tcl 8.6:

$ tclsh8.6
% set f [open file.txt w]
file3
% fconfigure $f
-blocking 1 -buffering full -buffersize 4096 -encoding utf-8 -eofchar {} -translation auto
$ fconfigure $f -encoding {}
$ fconfigure $f
-blocking 1 -buffering full -buffersize 4096 -encoding binary -eofchar {} -translation auto

Let's handle this in this ticket


apnadkarni added on 2024-06-21 03:37:52:
@sebres regarding "why not just "revert" back to the 8.x handling for consistency", see [85ddd247b6]. Though I do not myself understand why fixing that bug required the change in what fconfigure -encoding returned.

sebres added on 2024-06-20 14:13:34:

> I would expect "-translation" to give back "binary" in stead of "lf" here, then that can be used to check for binary mode.

Huh, so you'd expect still one additional backwards incompatibility in 9.0?

"-translation binary" did never set itself to "binary", it was solely a setter for grouped setting -encoding binary -translation lf -eofchar {}. Nothing else.

I don't think Tcl really needs a change like that.


jan.nijtmans added on 2024-06-20 13:42:19:
<pre>
> I observe that one of the goals of very recent commits on Tk widgets was to
> keep size values specified on options unchanged, as specified by the user,
> rather than changing them to the internal pixel value - that was considered
> bad practice, and this seems to me to be a similar case.
</pre>

Indeed, that's why I'm surprised that in Tcl 8.x:
<pre>
    % fconfigure $f -translation binary
    % fconfigure $f -translation
    lf
<\pre>

I would expect "-translation" to give back "binary" in stead of "lf" here, then that can be used to check for binary mode.

cjmcdonald added on 2024-06-20 13:21:27:
As the submitter of the ticket which Jan referred to, I'll comment that there was no confusion between "-encoding binary" and "-translation binary".  I'm aware of the difference, and would normally use "-translation binary" in a program, but chose to use "-encoding binary" in the ticket because I thought that it more clearly demonstrated the issue, which is that setting encoding to binary results in it showing an encoding of iso8859-1.  That is what is confusing.

Using "-translation binary" doesn't make any difference, it still results in a channel encoding of iso8859-1, in Tcl 9.0b2, whereas in 8.x it showed a binary encoding.  The description which Jan linked to as "clearly documenting" the behaviour simply says that the encoding should be set to binary, preferably using "-translation binary".  It says nothing about that encoding setting being silently changed to iso8859-1.  As far as I can see that's undocumented in the man pages, and is really exposing some artefact of the internal implementation to the user.  I wasn't aware that "-encoding binary" is an alias for "-encoding iso8859-1". 

I observe that one of the goals of very recent commits on Tk widgets was to keep size values specified on options unchanged, as specified by the user, rather than changing them to the internal pixel value - that was considered bad practice, and this seems to me to be a similar case.

oehhar added on 2024-06-20 11:35:50:

I agree with Sergey. The visual beauty of "-encoding binary" is great.

Thank you, Harald


sebres added on 2024-06-20 10:58:15:

There is indeed no encoding binary, but the channel may be binary, so why still not consider it like in 8.x as binary encoding? And a setting this via translation is a bit different to me: it'd also change eofchar and translation, what is correct for pure binary mode of channel, however if one uses something like:

chan configure -encoding binary -eofchar {}
the expectation is rather that translation remains {auto crlf} or {auto lf} depending on platform.

So why not just "revert" back to the 8.x handling for consistency and just accept pseudo encoding name binary for binary mode retaining translation as it is?

Let alone iso8859-1 is never comparable with binary mode, because one prefers strings another bytearrays (and for example strictly seen iso8859-1 has to reject all chars 0x00-0x1F and 0x7F-0x0F), so it is still more confused then (and previous handling of 8.x. remaining binary as value is more consistent).

Why having an artificial encoding name (what is not really an encoding just a mark for bytearray handling) is so bad? Take a look at unicode - it is also not existing encoding, but just a mark for unicode handling, nothing else. But the name is pretty usable.

> Many people use "-encoding binary" while they really mean "-translation binary".

This is not an argument in my opinion:

  1. RTFM
  2. In case of "backwards compatibility vs. theoretical possibility for typo" we shall always prefer the first, because...
  3. If one would consider arguments like that, then let us forbid everything what can introduce a confusion (hardly possible in a script-lang, especially in that with EIAS model) and ignore every compat question in the future.

So I'd just switch back to 8.x handling: binary is valid value that can be set with -encoding parameter similar to unicode (which is also unknown encoding, rather a sign we'll work internally with unicode string instead). But as already said, I'd really use binary as return value (instead of iso8859-1)... again, similar to 8.x, to unicode pseudo encoding, and pretty backwards compatible.


apnadkarni added on 2024-06-20 10:03:35:

I agree that current situation is confusing, and like Harald, have no clear opinion, just some thoughts.

If we were starting from scratch, I would have liked to stick to the principle that setting one option should never silently change another option as -translation does today. If you want the low level options (eof, encoding, translation) that are required for binary mode, set them via fconfigure yourself, or use the simple "open b" method. However, changing -translation to not modify eof and encoding settings would now be too disruptive, so that's not an option.

That leaves the question of what to do about -encoding binary. Leaving it in the current state means

  • fconfigure $chan -encoding returns a different value (iso8859-1) from what was set (binary). This gives me a bit of discomfort.
  • This is a (small) incompatibility with Tcl 8.

If changed as Jan suggests,

  • It is consistent in that there is really no encoding called "binary" and there is no question of fconfigure returning a different value than was set.
  • This is also incompatible with Tcl 8, though in a different, probably more common and therefore serious, manner than above.

A third option, not necessarily one I favor, but throwing it out there, is to make -encoding binary a synonym for -translation binary so it would also change eof and line-ending settings.

  • This in theory is the most incompatible with 8.6 but could actually most compatible in practice. Code that is using it will likely be also changing eof and line-endings and if not, should have been doing so and is probably broken code that would be fixed.

The second option (Jan's) seems cleanest but my gut feel is that the first and third options will cause the fewest compatibility problems. With the second option, correct code "fconfigure $chan -encoding binary -translation lf -eofchar {}" will now fail.

So...mixed feelings and not very helpful comments! If I were really pushed, I would keep things as they are (with a release note) but only because at this particular moment I lean towards compatibility over cleanliness :-)


jan.nijtmans added on 2024-06-20 09:50:08:

I will experiment with this


oehhar added on 2024-06-20 09:03:46:

Hi Jan, thank you for bringing this up.

Yes, the confusion is endless.

I see value in having "-encoding binary" to tell, that it is 1:1. And "-encoding 8859-1" sets an encoding.

By chance, both are the same.

But having different names for the same thing is also a challenge.

I can live with "-encoding binary" is accepted and magically changed to "-encoding 8859-1". But this is not beautiful.

I have no clear opinion on this.

The magic on setting multiple things the same time (-translation, -eofchar, -encoding) is handy. Specially, I like "open $f b". This "suggestes" that there is a special binary mode. But this does not exist in the interface. There are 3 independent settings in this field. And having "-encoding 8859-1" for binary makes the whole story more complicated.

Thank you, Harald