Tcl Source Code

View Ticket
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Ticket UUID: bcd1004104651998d1ae99aa6869e249c0ed0019
Title: A change to the system encoding, a stale cached file FsPath object, and an incorrect result
Type: Bug Version:
Submitter: pooryorick Created on: 2019-10-22 14:27:51
Subsystem: 37. File System Assigned To: pooryorick
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2019-10-28 19:26:14
Resolution: Fixed Closed By: pooryorick
    Closed on: 2019-10-28 19:26:14
Description:

The ext file system has no concept of "encoding": It names a file exactly the sequence of octets it is handed. In the following script, iso8859-1 is used to create a file, and then that same sequence of octets is obtained using glob. Then an FsPath object created under a different system encoding is created and cached. Later on that FsPath object is used under the original system encoding when it shouldn't be.

set name  登鸛鵲樓

encoding system iso8859-1

# create an iso8859-1-compatible sequence of octets to name the file by.
# They happen to be a string encoded in utf-8, but that's irrelevant.
set utf8 [encoding convertto utf-8 $name]

set chan [open $utf8 w]
puts $chan hello
close $chan

#
lassign [glob -directory [pwd] *$utf8*] fname



encoding system utf-8

# The sequence of octets is now itself converted to utf-8. Such a file does not
# exist, and a FsPath object has been cached.  So far so good.
puts [list utf-8 [file exists $fname]] ;#-> 0
encoding system iso8859-1

# Here the answer here should be 1, but the FsPath object created under a
# previous system encoding is used when it shouldn't be.
puts [list iso8859-1 [file exists $fname]]  ;# -> 0

# Discard the cached FsPath object, causing a new FsPath to be created using
# the (currently correct) system encoding.
puts [list iso8859-1 {string range} [file exists [string range $fname 0 end]]]  ;# -> 1

User Comments: jan.nijtmans added on 2019-10-28 16:41:56:
Well, I put back the test, so is this one fixed now?

jan.nijtmans added on 2019-10-28 16:25:29:

> Or ... just change the expected test result for UNIX ...

Of course, I meant Windows here ....


jan.nijtmans added on 2019-10-28 16:21:10:
Or ... just change the expected test result for UNIX ...

pooryorick added on 2019-10-28 16:11:46:

The bug can not be reproduced on NTFS filesystems in the first place. That's why with an NTFS filesystem the first result of filesystemEncoding-1.0 is 1 rather than 0.


pooryorick added on 2019-10-28 16:08:06:

Maybe I'm wrong, but because NTFS has a native encoding whereas common *NIX filsystems do not, this test can not be adapted for NTFS. Therefore it could be constrained to *NIX systems rather than removed.


jan.nijtmans added on 2019-10-28 08:58:50:
I just removed the filesystemEncoding test-case now, knowing that this bug is still not solved for Windows. That means, it's not critical any more now.

jan.nijtmans added on 2019-10-28 08:25:03:

On UNIX, it appears to work, but all travis Windows builds are failing:

==== filesystemEncoding-1.0 issue bcd100410465 FAILED
==== Contents of test case:
	set dir [tcltests::tempdir]
	set saved [encoding system]
	encoding system iso8859-1
	set fname1a $dir/$fname1
	set utf8name [encoding convertto utf-8 $fname1a]
	makeFile {} $utf8name
	set globbed [lindex [glob -directory $dir *] 0]
	encoding system utf-8
	lappend res [file exists $globbed]
	encoding system iso8859-1 
	lappend res [file exists $globbed]
	return $res

---- Result was: 1 1 ---- Result should have been (exact matching): 0 1 ==== filesystemEncoding-1.0 FAILED

This way Donald Porter cannot make his RC, so please have a look at this!


pooryorick added on 2019-10-26 15:42:06:

Fixed in [440efbd556ce7041] for core-8-6-branch. In contrast with the proposed fix for the core-8-branch, this is a one-line fix. It should probably also be applied to core-8-branch, and additional work done there to refine the record keeping when system encoding changes.


bll added on 2019-10-25 00:43:34:
Ok, reproduced.

bll added on 2019-10-24 23:05:35:
Unable to reproduce:
  8.6.9 on MX Linux 19
  8.6.5 on Ubuntu 16.04

bll added on 2019-10-24 23:01:10:
Unable to reproduce with tcl 8.6.9 on Linux.

pooryorick added on 2019-10-24 21:19:03:

The initially-provided example behaves as described in core-8-6-branch, but in core-8-branch only behaves as described prior to commit [165586b224f85f10], " TIP #430 implementation", 2018-09-12. This is because even when built with --disable-zipfs, in core-8-branch there are calls to ZipFSPathInFilesystemProc from Tcl_FSGetFileSystemForPath, and ZipFSPathInFilesystemProc uses Tcl_GetString(pathPtr) instead of working with the fsPathType internal representation.


jan.nijtmans added on 2019-10-24 21:04:02:
Hm ...  Added the Tcl_FreeEncoding() now, it doesn't seem to work ...

Anyway, I'll leave the branch to you, in case you want to work on it further. Please, next time, don't merge it to core-8-branch as long as you are not reasonable confident that all test-cases work.

Thanks!

pooryorick added on 2019-10-24 20:33:51:

The failue of encoding-2.2 was fixed by adding a corresponding Tcl_FreeEncoding for the newly-added Tcl_GetEncoding.

There's more work to be done on this issue though. There might be undesired interactions between FsPath.nativePathPtr and FsPath.filesystemEpoch. These are older issues, not something introduced by the current changes.


jan.nijtmans added on 2019-10-24 08:29:22:

Since Travis already fails for 2 days, I put the work on this ticket in a separate [bug-bcd1004104] branch. Please have a look: Something still wrong with the code, or should the test-case be revised?


jan.nijtmans added on 2019-10-22 22:11:24:

Starting with [0f2870649c804dd8]:

encoding.test

==== encoding-2.2 Tcl_FreeEncoding: refcount != 0 FAILED ==== Contents of test case:

encoding system shiftjis ;# incr ref count encoding dirs [list [pwd]] set x [encoding convertto shiftjis \u4e4e] ;# old one found encoding system iso8859-1 llength shiftjis ;# Shimmer away any cache of Tcl_Encoding lappend x [catch {encoding convertto shiftjis \u4e4e} msg] $msg

---- Result was: ŒÁ 0 ŒÁ ---- Result should have been (exact matching): ŒÁ 1 {unknown encoding "shiftjis"} ==== encoding-2.2 FAILED