Tcl Source Code

View Ticket
Login
2023-04-07
21:10 Closed ticket [fa3d9fd818]: fcopy $chan1 $chan2 -size $size is not puts -nonewline $chan2 [read $chan1 -size $size plus 5 other changes artifact: 914d6c27e3 user: pooryorick
21:07
Merge trunk [b66d50b4d4]: Merge bug-[fa3d9fd818fa0072]. check-in: 3626b22792 user: pooryorick tags: unchained
2023-04-06
12:24
Merge bug-[fa3d9fd818fa0072]. check-in: b66d50b4d4 user: pooryorick tags: trunk, main
10:51 Ticket [fa3d9fd818] fcopy $chan1 $chan2 -size $size is not puts -nonewline $chan2 [read $chan1 -size $size status still Open with 3 other changes artifact: b7e16b76e3 user: pooryorick
10:03
Further fix for [fa3d9fd818fa0072]. In ChannelState.encoding, NULL no longer represents the binary ... check-in: 39a45eb8ff user: pooryorick tags: bug-fa3d9fd818fa0072
2023-04-05
22:49 Ticket [fa3d9fd818] fcopy $chan1 $chan2 -size $size is not puts -nonewline $chan2 [read $chan1 -size $size status still Open with 3 other changes artifact: 883b185c27 user: jan.nijtmans
22:23 Ticket [fa3d9fd818]: 3 changes artifact: e50650a923 user: jan.nijtmans
17:10 Open ticket [fa3d9fd818]. artifact: c457c85abe user: pooryorick
2023-04-04
22:08 Ticket [fa3d9fd818]: 3 changes artifact: c50a64cc22 user: jan.nijtmans
22:06 Ticket [fa3d9fd818]: 3 changes artifact: 6218d6fca4 user: jan.nijtmans
2023-04-03
21:28 Pending ticket [fa3d9fd818]. artifact: 3a99a85328 user: pooryorick
21:20
Fix for [fa3d9fd818fa0072], [fcopy $chan1 $chan2 -size $size] is not [puts -nonewline $chan2 [read $... check-in: 51d813943b user: pooryorick tags: trunk, main
20:58
Fix for [fa3d9fd818fa0072], [fcopy $chan1 $chan2 -size $size] is not [puts -nonewline $chan2 [read $... check-in: 704a7e8389 user: pooryorick tags: bug-fa3d9fd818fa0072
20:55 New ticket [fa3d9fd818] fcopy $chan1 $chan2 -size $size is not puts -nonewline $chan2 [read $chan1 -size $size. artifact: ce5e1994d6 user: pooryorick

Ticket UUID: fa3d9fd818fa0072b60dcd14614ebf33d90e4d6b
Title: [fcopy $chan1 $chan2 -size $size] is not [puts -nonewline $chan2 [read $chan1 -size $size]
Type: Bug Version:
Submitter: pooryorick Created on: 2023-04-03 20:55:33
Subsystem: 25. Channel System Assigned To: pooryorick
Priority: 1 Zero Severity: Important
Status: Closed Last Modified: 2023-04-07 21:10:24
Resolution: Fixed Closed By: pooryorick
    Closed on: 2023-04-07 21:10:24
Description:

Even if two channels have he same encoding [fcopy $chan1 $chan2 -size] might not be the same as [fcopy $chan1 $chan2 -size]. The makes [fcopy -size] difficult to use correctly.

In the following script, copying with a size of 1 between to channels, both using utf-8 encoding, does not copy a full character. Although the behaviour is currently documented to be what it is, it's still bad, and should be fixed:

set chan [file tempfile]
chan configure $chan -encoding utf-8 -profile strict
puts -nonewline $chan รก
flush $chan
chan configure $chan -profile tcl8 -translation lf -eofchar {}

seek $chan 0
set chandata [read $chan]

set chan2 [file tempfile]
chan configure $chan2 -profile tcl8 -encoding utf-8

seek $chan 0
fcopy $chan $chan2 -size 1
flush $chan2

seek $chan2 0
set chan2data [read $chan2]
close $chan2

puts [list fcopy equals original? [expr {$chan2data eq $chandata}]]


set chan3 [file tempfile]
chan configure $chan2 -profile tcl8 -encoding utf-8
seek $chan 0
puts -nonewline $chan3 [read $chan 1]

close $chan

seek $chan3 0
set chan3data [read $chan3]

puts [list read/write equals original? [expr {$chan3data eq $chandata}]]

User Comments: pooryorick added on 2023-04-06 10:51:52:

Fixed in [39a45eb8ff]. I think the proper fix is to stop using NULL to represent the binary encoding in ChannelState.encoding. iso8859-1 is the encoding for handling binary data, and using it brings the implementation more in line with the conceptual design and also reduces the number of different code paths in IO operations. One new aspect of this fix is that setting the encoding to "binary" also sets the profile to strict, which is needed to make some tests pass, but should also help developers avoid silent data corruption for both input and output. For example, in test io-52.10 the previous result was nothing sensible. Now the result is an error that lets the programmer know strange things are afoot in their code. This commit also paves the way for further simplification of the code in tclIO.c.


jan.nijtmans added on 2023-04-05 22:49:02:

There's a setting "Beta: Use Unicode UTF-8 for worldwide language support" in Windows. So, if we wait long enough, UTF-8 will eventually become the default encoding in Windows. The problem will automatically disappear. So we could simply ignore it. Not high prio to fix.


jan.nijtmans added on 2023-04-05 22:23:41:
> I don't know why the tests fail only with the msvc build though

It's because the gcc environment uses the bash shell, which reports to its clients that the system encoding is utf-8. The Visual Studio build environment uses the normal CMD shell, which has the normal system encoding on Windows. So, yes, this explains the difference.

So, this leads to 3 different solutions.
* Implementation change: Whenever ChannelState.encoding is NULL, make sure that Tcl_UtfToExternal() is explicitly given the iso8859-1 encoding.
* don't set ChannelState.encoding to NULL when we want iso8859-1
* change Tcl_UtfToExternal() so NULL means iso8859-1

I think the last one is not desirable (would need a TIP, but utf-8 would be more reasonable), but the other two should be possible.

pooryorick added on 2023-04-05 17:10:51:

I believe the test failures on Windows are related to the fact that Tcl_UtfToExternal() sets the encoding to the system encoding when encoding is NULL. I don't know why the tests fail only with the msvc build though. The design problem is that in ChannelState.encoding, NULL represents the binary encoding, whereas in <Tcl_UtfToExternal(), NULL represents the system encoding. There are probably several ways to fix that, with some of those ways being more hacky than others.


jan.nijtmans added on 2023-04-04 22:08:31:

Further info: In the "bug-fa3d9fd818fa0072" branch, the same failure can be seen, so it's sure that this is the trigger.


jan.nijtmans added on 2023-04-04 22:06:23:

Unfortunately, with this fix two testcases started failing on Windows, see:

https://github.com/tcltk/tcl/actions/runs/4605104184/jobs/8152506769

For now, I disabled those 2 test-cases, but please take a look.


pooryorick added on 2023-04-03 21:28:37:

Fixed in [51d813943bcaf835].