Tcl Source Code

View Ticket
Login
Ticket UUID: 5fca83d78cf37b70d3e958897b542f985f2ce56a
Title: [encoding system] is wrong in an ISO-8859-1 locale
Type: Patch Version: 9.0b2
Submitter: bhaible Created on: 2024-07-01 18:34:20
Subsystem: 38. Init - Library - Autoload Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2024-07-01 19:56:44
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2024-07-01 19:56:44
Description:
On a Linux/glibc system, I have two French locales:
$ LC_ALL=fr_FR.UTF-8 locale charmap
UTF-8
$ LC_ALL=fr_FR.ISO-8859-1 locale charmap
ISO-8859-1

('locale charmap' is the command-line equivalent of nl_langinfo(CODESET).)

In Tcl, [encoding system] should be set to an equivalent value, otherwise
strings with non-ASCII characters are printed incorrectly to standard output.

This works in Tcl 8.6:

$ LC_ALL=fr_FR.UTF-8 tclsh8.6
% encoding system
utf-8

$ LC_ALL=fr_FR.ISO-8859-1 tclsh8.6
% encoding system
iso8859-1

But it does not work for the ISO-8859-1 locale any more in Tcl 9.0 beta2:

$ LC_ALL=fr_FR.UTF-8 tclsh9.0
% encoding system
utf-8

$ LC_ALL=fr_FR.ISO-8859-1 tclsh9.0
% encoding system
utf-8

I debugged it. nl_langinfo(CODESET) comes out as "ISO-8859-1".
This string is lowercased, producing "iso-8859-1", and then a lookup in
LocaleTable is done, in function SearchKnownEncodings. The table
happens to have 174 elements, and the "iso-8859-1" is at index 80.
Due to a logic bug in function SearchKnownEncodings, this index is not found.
SearchKnownEncodings then returns NULL.

The attached patch fixes the bug. The bug is present since 2005, but
triggered on different encodings, because of the influence of the table
length and the searched entry's index.
User Comments: jan.nijtmans added on 2024-07-01 19:56:44:

Thanks for this report and the patch!

Your explanation makes 100% sense! Following the code, I see what's happening, and why this wasn't noticed in Tcl 8.x. Many thanks!

Fixed in all branches now. Will be part of Tcl 9.0b3


Attachments: