Tcl Source Code

View Ticket
Login
Ticket UUID: 87b69745be15c2816574ec415dfe2c3c407cc5d6
Title: interp creation resets encoding directory search path
Type: Bug Version: 9.0
Submitter: apnadkarni Created on: 2025-07-31 05:00:27
Subsystem: - New Builtin Commands Assigned To: nobody
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2025-08-23 00:55:54
Resolution: Fixed Closed By: apnadkarni
    Closed on: 2025-08-23 00:55:54
Description:

interp create results in the currently set search path for encodings to be reset as illustrated below.

% encoding dirs
//zipfs:/lib/tcl/tcl_library/encoding
% encoding dirs [list [encoding dirs] c:/src]
//zipfs:/lib/tcl/tcl_library/encoding c:/src
% encoding dirs
//zipfs:/lib/tcl/tcl_library/encoding c:/src
% interp create
interp0
% encoding dirs
//zipfs:/lib/tcl/tcl_library/encoding
User Comments: apnadkarni added on 2025-08-23 00:55:54:
What is the test configuration in which you are seeing errors? Neither my tests not Github CI showed any failures on any platform before I committed.

On the current core-9-0-branch, Github CI still shows no failures and now I re-tested the following combinations on Ubuntu:

```
configure
make test
make test TESTFLAGS="-singleproc 1"

configure --disable-shared
make test
make test TESTFLAGS="-singleproc 1"

configure --disable-zipfs
make test
make test TESTFLAGS="-singleproc 1"
```

as well as the equivalents on Windows

```
nmake /f makefile.vc test
nmake /f makefile.vc test TESTFLAGS="-singleproc 1"
nmake /f makefile.vc OPTS=static test
nmake /f makefile.vc OPTS=static test TESTFLAGS="-singleproc 1"
nmake /f makefile.vc OPTS=noembed test
nmake /f makefile.vc OPTS=noembed test TESTFLAGS="-singleproc 1"

```

None show any failures.

Failing on ASCII input 0x7f is very strange indeed and if it seems to happen only on your system stranger still. Something uninitialized in specific circumstance...?

What platform are you on and what were the specific configure and test options?

dgp added on 2025-08-22 18:19:08:
On core-9-0-branch, starting with checkin [0433b67adc], I see 114
failing tests in cmdAH.test and encoding.test.  The first of them is


==== cmdAH-4.3.13.7F.solo.ascii.strict.a encoding convertfrom -profile strict ascii  FAILED
==== Contents of test case:
encoding convertfrom -profile strict ascii 
---- Test generated error; Return code was: 1
---- Return code should have been one of: 0 2
---- errorInfo: unexpected byte sequence starting at index 0: '\x7F'
    while executing
"encoding convertfrom -profile strict ascii "
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 $script"
---- errorCode: TCL ENCODING ILLEGALSEQUENCE 0
==== cmdAH-4.3.13.7F.solo.ascii.strict.a FAILED


How can I help unbreak the branch??

jan.nijtmans added on 2025-08-18 11:50:40:

Review remark: indenting in tclEncoding.c is not consistent. All other files in this commit are fine. Fixed [ee22d2717fc9d6c3|here].

Sorry, Ashok.


apnadkarni added on 2025-08-14 13:47:18:
Fixed in [0433b67adc]. Broader thread safety issues with systemEncoding to be addressed separately.

apnadkarni added on 2025-08-14 13:29:56:
Yes of course. But tracking that as a separate bug that you have already logged (and I duplicated yesterday) as the required solution is much broader (as you also stated). The possibility of race conditions in encoding routines is much greater and has to be fixed.

sebres added on 2025-08-14 11:47:28:

No problem.

> to convince oneself there are no race conditions or other effects

But there is a race condition issue. The possibility to overwrite a system encoding from every thread (also protected by mutex) will cause that previously set system-encoding, which can be still used by other threads with other functions (see ticket [f2ff05fc84]) may get released (by last reference). And your extra mutex doesn't protect against it at all. Sure, it makes that not more worse as it was, but the lock protection is definitely overestimated...
As already said, it is more or less conceptual bug in Tcl_UtfToExternal*, Tcl_ExternalToUtf* and co.


apnadkarni added on 2025-08-14 11:32:48:
@sebres, I'm merging my original code. What you said about not requiring a lock may well be true but except in performance critical paths (this is not one), I prefer to stick with the lock for shared data in the general belief that ensuring access is safe without locks needs careful thought and is prone to errors (possibly even in future when code is modified). Not worth the time spent, imo, to convince oneself there are no race conditions or other effects.

sebres added on 2025-08-13 18:26:22:

Tcl mutexes are recursive as far as I can tell.

Yes, as already confirmed in the core mailing list, you are right and this is indeed reentrant now (changed in 8.7+ after TIP#509).

I do not like the comparison of systemEncoding outside of holding the mutex.

And if it is compared by locked mutex what would it change?
The issue is, as already said, just a conceptual bug and simply a timing thing and may happen with and without lock. In particular of mentioned code, mutex outside of the comparison changes absolutely nothing and in best case may only avoid some unneeded epoch change, however only if system encoding changes continuously across several threads. But then, as already said, you'd have completely different issues (ending with SF).

The only solution would be something like I provided in [f2ff05fc84], or system encoding stored per thread (TSV).

Globals shared between threads should be accessed under a lock even when reading.

No. In this particular comparison it doesn't change something, since it is only protection against increase of FS epoch and function simply returns if it is equal now. Few nanoseconds later or earlier it may be different (no matter with or without lock). With already mentioned consequences.

Feel free to revert it (mark may change as mistake branch and reset tags to previous commit), if you don't like it. But I guess in case of possible race condition, you overestimate the lock protection here and underestimate the consequences of conceptual bug (the issue by "wrong" design - thread-safety of the system encoding).


apnadkarni added on 2025-08-13 17:28:33:

@sebres, I am not in agreement with your mods.

First, as I mentioned in the core mailing list, Tcl mutexes are recursive as far as I can tell.

Second, I do not like the comparison of systemEncoding outside of holding the mutex. Globals shared between threads should be accessed under a lock even when reading. While I could probably create a case where that comparison of defaultEncoding and systemEncoding is a race condition, as a matter of principle the safer route is always accessing under a lock instead of proving it is safe.

In fact, looking at the management of systemEncoding without locks led me to create this (unrelated) ticket about whether more fixes related to systemEncoding synchronization are needed.

Of course, things are different if Tcl mutexes are not recursive and I misunderstood the manpage.


sebres added on 2025-08-13 16:11:19:

After a quick review, I don't entirely agree with the changes in Tcl_SetSystemEncoding, because it may have an issue by moving of Tcl_MutexLock(&encodingMutex) to head of function, which now would cause that due to Tcl_GetEncoding() (which also locks mutex) would do that twice (enters the lock recursively).
Note that the mutexes are not necessarily reentrant on all systems, so a recursive locking may be an UB (and may cause a deadlock). Also Tcls documentation says that "the result of locking a mutex twice from the same thread is undefined".

Additionally note that this mutex only protect the hash table of encodings and encoding' refCount, it doesn't really protect static systemEncoding (nor the defaultEncoding) directly. Let alone it'd be still affected by [f2ff05fc84] and co.

Commit [bf62ce24c60c7b8a] shall fix it now.


apnadkarni added on 2025-08-06 13:53:41:
Reminder to self - need to investigate if wish / Tk have a similar issue.

apnadkarni added on 2025-08-06 13:40:50:

Branch bug-87b69745be contains a proposed solution to this and other zipfs related encoding bugs that previously contained point fixes.

  • It moves locating and mounting of the zipfs containing the tcl library to before creation of any interpreters instead of in the tclInit script. This seems logically correct to me as different interpreters should not be using different directories for tcl_library as structures such as file systems and encodings are shared process wide and it does not make sense to have them initialized from different sources. This was already partly achieved by the init flag Sergey introduced to cache the zipfs search but not doing the search at all is more comprehensive.

  • It also ensures the file system epoch is only updated if the encoding actually changes. This fixes the issue observed by Sergey (but not logged as a separate ticket) where the file system epoch changes much more frequently than in Tcl 8.6.

Review requested.


apnadkarni added on 2025-07-31 05:16:21:
Looks like the bug was introduced with commit [450d12dac5].