Tcl Source Code

View Ticket
Login
Ticket UUID: d433c0e0add0496ef4a7d73c8580149605dadff6
Title: TCL_UTF_MAX == 4 problems
Type: Bug Version: core-8-6-10-rc
Submitter: chw Created on: 2019-11-12 23:43:27
Subsystem: 44. UTF-8 Strings Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2019-11-13 14:25:53
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2019-11-13 14:25:53
Description:
On Debian 9 using

 $ cd .../unix ; CC="cc -DTCL_UTF_MAX=4" ./configure --prefix=/tmp/mytcl --disable-static ; make
 $ ./tclsh
 % set X "\U1F602"

freezes when run in a gnome-terminal.

If an xterm is used instead, more chars than expected are output from the tclsh, as shown in
this strace dump:

 write(1, "\360\302\230\302\230\302\202\r\n", 9) = 9

When rebuilding with

 $ cd .../unix ; CC="cc -DTCL_UTF_MAX=6" ./configure --prefix=/tmp/mytcl --disable-static ; make
 $ ./tclsh
 % set X "\U1F602"

the emoji is properly displayed in gnome-terminal and strace yields the expected

 write(1, "\360\237\230\202\r\n",  6) = 6

Right now, this ticket is a showstopper for undroidwish/vanillawish (not for AndroidWish).
User Comments: jan.nijtmans added on 2019-11-13 14:25:53:
Thanks for confirming the fix! ..  And for the bug report to start with ... !!!!

chw added on 2019-11-13 14:07:20:
Bingo! The exact same sequence I've run against [b5633ba3bd]
now works both with "\U1F602" and "\uD83D\uDE02" notation.
I think you can close the ticket now, thanks for fixing.

jan.nijtmans added on 2019-11-13 12:39:01:
Please, try again with [e377ac273f]. I think I really got it now ...

chw added on 2019-11-13 09:51:51:
Again on Debian 9, core-8-6-branch [b5633ba3bd] using

  $ cd .../unix ; CC="cc -DTCL_UTF_MAX=4" ./configure --prefix=/tmp/mytcl --disable-static ; make
  $ ./tclsh
  % set X "\U1F602"

the gnome-terminal hangs and the strace dump is

  write(1, "\360\302\230\302\230\302\202\r\n", 9)

Using surrogate pair notation outputs the expected emoji

  % set X "\uD83D\uDE02"

and strace dump is

  write(1, "\360\237\230\202\r\n", 6) = 6

Now let's do the opposite direction

  % set X "\uD83D\uDE02"
  % set F [open OUT w]
  % puts -nonewline $F $X
  % close $F
  % set F [open OUT]
  % set Y [read $F] ; close $F
  % set Y

the gnome-terminal hangs again and the strace dump is

  write(1, "\303\260\302\230\302\230\302\202\r\n", 10)

jan.nijtmans added on 2019-11-13 09:21:36:

Did you bisect? If you did, I guess that commit [9e1984c250d1a859] introduced this bug. This commit made it possible that TclUtfToUniChar() is called when it points to the 2nd character of a valid 4-byte UTF-8 character. The macro didn't account for that.

Thanks!


jan.nijtmans added on 2019-11-13 09:10:00:

This should be fixed in [b5633ba3bd8fa74e]. Can you confirm that this fixes this? Thanks!


jan.nijtmans added on 2019-11-13 07:52:19:
Thanks for the report. On core-8-6-10-rc this is definitely exported to work, so raising to "minor"

chw added on 2019-11-13 06:47:37:
For reference, the core-8-6-9 branch built with

 $ cd .../unix ; CC="cc -DTCL_UTF_MAX=4" ./configure --prefix=/tmp/mytcl --disable-static ; make
 $ ./tclsh
 % set X "\U1F602"

outputs the expected emoji and the strace dump is:

 write(1, "\360\237\230\202\r\n", 6) = 6

chw added on 2019-11-13 06:42:25:
The core-8-7-a3-rc branch built with

 $ cd .../unix ; CC="cc -DTCL_UTF_MAX=4" ./configure --prefix=/tmp/mytcl --disable-static ; make
 $ ./tclsh
 % set X "\U1F602"

gives some weird output in gnome-terminal and the strace dump is:

 write(1, "\360\313\234\313\234\342\200\232\r\n", 10)