Tcl Source Code

View Ticket
Login
Ticket UUID: fa3d9fd818fa0072b60dcd14614ebf33d90e4d6b
Title: [fcopy $chan1 $chan2 -size $size] is not [puts -nonewline $chan2 [read $chan1 -size $size]
Type: Bug Version:
Submitter: pooryorick Created on: 2023-04-03 20:55:33
Subsystem: 25. Channel System Assigned To: pooryorick
Priority: 1 Zero Severity: Important
Status: Closed Last Modified: 2023-04-07 21:10:24
Resolution: Fixed Closed By: pooryorick
    Closed on: 2023-04-07 21:10:24
Description: (text/x-fossil-wiki)
Even if two channels have he same encoding [fcopy $chan1 $chan2 -size] might
not be the same as [fcopy $chan1 $chan2 -size].  The makes [fcopy -size]
difficult to use correctly.

In the following script, copying with a size of 1 between to channels, both
using utf-8 encoding, does not copy a full character. Although the behaviour is
currently documented to be what it is, it's still bad, and should be fixed:


<code><verbatim>
set chan [file tempfile]
chan configure $chan -encoding utf-8 -profile strict
puts -nonewline $chan รก
flush $chan
chan configure $chan -profile tcl8 -translation lf -eofchar {}

seek $chan 0
set chandata [read $chan]

set chan2 [file tempfile]
chan configure $chan2 -profile tcl8 -encoding utf-8

seek $chan 0
fcopy $chan $chan2 -size 1
flush $chan2

seek $chan2 0
set chan2data [read $chan2]
close $chan2

puts [list fcopy equals original? [expr {$chan2data eq $chandata}]]


set chan3 [file tempfile]
chan configure $chan2 -profile tcl8 -encoding utf-8
seek $chan 0
puts -nonewline $chan3 [read $chan 1]

close $chan

seek $chan3 0
set chan3data [read $chan3]

puts [list read/write equals original? [expr {$chan3data eq $chandata}]]
</verbatim></code>
User Comments: pooryorick added on 2023-04-06 10:51:52: (text/x-fossil-wiki)
Fixed in [39a45eb8ff]. I think the proper fix is to stop using NULL to
represent the binary encoding in <code>ChannelState.encoding</code>.
<code>iso8859-1</code> is the encoding for handling binary data, and using it
brings the implementation more in line with the conceptual design and also
reduces the number of different code paths in IO operations.  One new aspect of
this fix is that setting the encoding to "binary" also sets the profile to
strict, which is needed to make some tests pass, but should also help
developers avoid silent data corruption for both input and output.  For
example, in test io-52.10 the previous result was nothing sensible.  Now the
result is an error that lets the programmer know strange things are afoot in
their code.  This commit also paves the way for further simplification of the
code in tclIO.c.

jan.nijtmans added on 2023-04-05 22:49:02: (text/x-fossil-wiki)
There's a setting "Beta: Use Unicode UTF-8 for worldwide language support" in Windows. So, if we wait long enough, UTF-8 will eventually become the default encoding in Windows. The problem will automatically disappear. So we could simply ignore it. Not high prio to fix.

jan.nijtmans added on 2023-04-05 22:23:41:
> I don't know why the tests fail only with the msvc build though

It's because the gcc environment uses the bash shell, which reports to its clients that the system encoding is utf-8. The Visual Studio build environment uses the normal CMD shell, which has the normal system encoding on Windows. So, yes, this explains the difference.

So, this leads to 3 different solutions.
* Implementation change: Whenever ChannelState.encoding is NULL, make sure that Tcl_UtfToExternal() is explicitly given the iso8859-1 encoding.
* don't set ChannelState.encoding to NULL when we want iso8859-1
* change Tcl_UtfToExternal() so NULL means iso8859-1

I think the last one is not desirable (would need a TIP, but utf-8 would be more reasonable), but the other two should be possible.

pooryorick added on 2023-04-05 17:10:51: (text/x-fossil-wiki)
I believe the test failures on Windows are related to the fact that
<code>Tcl_UtfToExternal()</code>
[https://core.tcl-lang.org/tcl/file?udc=1&ln=1713&ci=b27eacb7fcde86db&name=generic%2FtclEncoding.c|sets
the encoding to the system encoding] when encoding is <code>NULL</code>.  I
don't know why the tests fail only with the msvc build though.  The design
problem is that in <code>ChannelState.encoding</code>, <code>NULL</code>
represents the binary encoding, whereas in <Tcl_UtfToExternal()</code>,
<code>NULL</code> represents the system encoding.  There are probably several
ways to fix that, with some of those ways being more hacky than others.

jan.nijtmans added on 2023-04-04 22:08:31: (text/x-fossil-wiki)
Further info: In the "bug-fa3d9fd818fa0072" branch, the same failure can be seen, so it's sure that this is the trigger.

jan.nijtmans added on 2023-04-04 22:06:23: (text/x-fossil-wiki)
Unfortunately, with this fix two testcases started failing on Windows, see:

[https://github.com/tcltk/tcl/actions/runs/4605104184/jobs/8152506769]

For now, I disabled those 2 test-cases, but please take a look.

pooryorick added on 2023-04-03 21:28:37: (text/x-fossil-wiki)
Fixed in [51d813943bcaf835].