Tk Source Code

View Ticket
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.
Ticket UUID: 00a27923ee26437611e1ed83f96e15b6caabcd8b
Title: text/entry dysfunctional when pasting an emoji on MacOSX
Type: Bug Version: core-8-6-branch at least
Submitter: chw Created on: 2017-12-30 14:13:43
Subsystem: 83. Mac OS X Build Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Severe
Status: Open Last Modified: 2019-05-12 05:34:35
Resolution: Fixed Closed By: nobody
    Closed on:
If a string containing an emoji (or something beyond BMP) is pasted into a text or entry widget on MacOSX the entire toplevel may render unresponsive for unknown reasons.

The attached patch forces the incoming and outgoing clipboard information to be UTF-8 reformatted according to the direction of the information flow.

Although the (then to be observed) result of the paste operation is debatable the unresponsiveness can't be reproduced anymore.
User Comments: chrstphrchvz added on 2019-05-12 05:34:35:

Cc'ing myself since this looks tangentially related to my newer ticket [0a5853aa61]

jan.nijtmans added on 2018-01-10 09:15:24:

> Therefore the logic to deal with invalid stuff should be put in tkMacOSXFont.c (and possibly the other POSIX/Win32 interfaces, too).

Well, I think that the Mac and POSIX should get some functions like Windows already has: Tcl_WinUtfToTChar/Tcl_WinTCharTcharToUtf. For now, it's OK for Tk to provide those (your UtfToUTF16DString is a good start), but eventually those functions should be provided by Tcl. Or - maybe - a "ui_system" encoding should be provided by Tk, then we can use the already existing Tcl_ExternalToUtf/Tcl_UtfToExternal (just a thought, I'm not sure yet).

My first priority is to get it right for TCL_UTF_MAX=4, then it can go along with TIP #389 and people can expect Emoji's to work right starting with Tcl 8.7. The second priority is getting it as good as possible for TCL_UTF_MAX=3, then those additional fixes can be backported to 8.6. Finally, TCL_UTF_MAX=6 is a nice to have (since many more fixes in Tcl and Tk are needed for that, which androwish maybe already has but the core not yet).

The "\uFFFD" replacement implementation for TCL_UTF_MAX=3 is now in the "tip-389" branch (even though this branch is actually meant for TCL_UTF_MAX=4 only).

Thanks, Christian, as always!

chw added on 2018-01-09 19:54:26:
This morning I built from tcl (tip-389) and tk (bug-00a27923ee) using the three possible settings for TCL_UTF_MAX (3, 4, 6) and observed different looking contents in a text widget when inserting invalid or partial surrogates.

IMO the point should be (regardless what the Tcl core does), to try to render things (including invalid ones) on the display as identical as possible. Therefore the logic to deal with invalid stuff should be put in tkMacOSXFont.c (and possibly the other POSIX/Win32 interfaces, too).

jan.nijtmans added on 2018-01-09 11:25:56:

Yes, Christian, I see your point. Thinking more about it, this "\uFFFD" replacement should actually be Tcl's responsibility.


Does that also work for you?


chw added on 2018-01-09 06:00:33:
Let me explain the idea of the "\uFFFD" replacement in UtfToUTF16DString(): For any stray surrogate e.g. "AB\uD83DXY" or "\uDE03\uDE02" or "\uD83D\uD83D" a consistent presentation on screen should result independent of TCL_UTF_MAX at built time. For the case where the core supports BMP only, it is IMO better to replace even valid surrogate pairs with "\uFFFD" in order not to pretend to support the full unicode range.

Since this affects only the font measure/render step, it is presentation only and does not alter the original data.

jan.nijtmans added on 2018-01-08 12:16:11:
For TCL_UTF_MAX=3 and TCL_UTF_MAX=6, the patch looks OK. For TCL_UTF_MAX=4 certainly not, since Tcl then already works with surrogates internally, nothing special should be needed. Remark: Testing Tk with TCL_UTF_MAX=4 requires linking it with the "tip-389" branch of Tcl, which is not put up for voting yet.

Hopefully I fixed that now in the branch, but that should be tested again in combination with TIP #389.

fvogel added on 2018-01-07 20:41:53:
Thanks for this second patch. The first one did indeed not fix the hang with the three-liner terst case you provided just below.

This one does fix the hang, however it's quite a big patch that needs to be reviewed and understood.

Since I'm not really understanding it, may I ask Jan Nijtmans, who is deeply proficient in that TCL_UTF_MAX stuff, to review this and provide feedback?

chw added on 2018-01-02 10:30:13:
Here is a simpler test case:

 text .t
 pack .t
 .t insert end "123\uD83D\uDE02XYZ"

In my humble impression this problem now is a teenager.

The attached patch adds some more safety to the clipboard and provides a more robust conversion from UTF-8 to Utf16Char for text measurement and rendering which hopefully addresses all possible TCL_UTF_MAX variants.

fvogel added on 2018-01-01 19:23:08:
> I've used a terminal and the subwindow which allows to pick an emoji
> to be inserted into the terminal

Can't find this. On macOS I opened a terminal (this is actually a bash window), but then where is that subwindow full of emojis?

Anyway I trust you, so I'll merge.

chw added on 2018-01-01 16:01:39:
And BTW "\u1F600" is 0x1F60 0x0030 as UCS-2 so you'd need "\U1F600" for something beyond BMP. But this won't work either with a BMP only Tcl core.

chw added on 2018-01-01 15:53:36:
Sorry for not being precise on how I got at this problem. I've used a terminal and the subwindow which allows to pick an emoji to be inserted into the terminal, then copied the emoji from the terminal with the mouse and pasted it with the middle mouse button into a text or entry widget.

I doubt that you can produce a 4 byte UTF sequence using "event generate" thus cannot create a test case without further adding a test function which allows for producing 4 byte UTF sequences.

Maybe these pieces are helpful, too:

fvogel added on 2018-01-01 14:34:31:

I can't reproduce freezing the text widget (I mean: without the patch).

I tried on macOS with:

package require Tk
pack [text .t]
clipboard clear
clipboard append \u1F600  ; # smiling face emoji
event generate .t <<Paste>>
# typing in the text widget is working at this point

Can you be more specific about how you produce the problem?

Also, the result of the paste operation has nothing in common with the smiling face emoji I copied into the clipboard. It is in fact two characters, the first one looks like a lowercase omega greek letter with kind of an accent on top of it, and the second character looks like a standard zero. Perhaps a font issue.

chw added on 2017-12-31 13:58:00:
No idea, how a test case can exercise the changes. The idea of the changes is to enforce a proper UTF-8 representation of both input and output since the selection/clipboard is an interface to the outside world.

But now that we enforce proper outgoing format, we can't read back invalid data by creating our own dog foood. Thus, it would require a specific test support function to produce errors programatically.

And before the patch, I did not oversee the problem. It just seemed to freeze the text or entry widget forever without further consequences.

fvogel added on 2017-12-31 09:09:29:

Thanks for the report and even more thanks for the fix. I have applied it to branch bug-00a27923ee.

I have run the clipboard.test test file on macOS and see four failures clipboard-4.1, -4.2, -4.4 and -6.2. However these are also present in core-8-6-branch and look unrelated.

Could a new test case be created in association to this bug?