Tk Source Code

View Ticket
Bounty program for improvements to Tcl and certain Tcl packages.
11:28 Closed ticket [00a27923]: text/entry dysfunctional when pasting an emoji on MacOSX plus 5 other changes artifact: 67fa9b8c user: jan.nijtmans
23:07 Ticket [00a27923]: 3 changes artifact: cc1a9446 user: marc_culler
01:04 Ticket [00a27923]: 3 changes artifact: ba8b5eba user: chrstphrchvz
01:01 Ticket [a1795648] Tk 8.6: prevent issues when encountering non-BMP Unicode characters status still Open with 3 other changes artifact: 612e9d75 user: chrstphrchvz
05:34 Ticket [00a27923] text/entry dysfunctional when pasting an emoji on MacOSX status still Open with 4 other changes artifact: f175a75d user: chrstphrchvz
09:15 Ticket [00a27923]: 3 changes artifact: 07ea7754 user: jan.nijtmans
19:54 Ticket [00a27923]: 3 changes artifact: 39b16bf7 user: chw
11:25 Ticket [00a27923]: 3 changes artifact: cd0e96e1 user: jan.nijtmans
06:00 Ticket [00a27923]: 3 changes artifact: 5265630d user: chw
12:16 Ticket [00a27923]: 3 changes artifact: 090d0b39 user: jan.nijtmans
20:41 Ticket [00a27923]: 4 changes artifact: 0afa4f37 user: fvogel
10:31 Add attachment tkmacosx.patch to ticket [00a27923] artifact: dd058045 user: chw
10:30 Ticket [00a27923] text/entry dysfunctional when pasting an emoji on MacOSX status still Open with 3 other changes artifact: 6ac0dfcb user: chw
19:23 Ticket [00a27923]: 3 changes artifact: 17e84c49 user: fvogel
16:01 Ticket [00a27923]: 3 changes artifact: cc3d49da user: chw
15:53 Ticket [00a27923]: 3 changes artifact: a460f131 user: chw
14:34 Ticket [00a27923]: 3 changes artifact: 80eb1e38 user: fvogel
13:58 Ticket [00a27923]: 3 changes artifact: 892cf2f1 user: chw
09:09 Ticket [00a27923]: 5 changes artifact: 5771a7e2 user: fvogel
Fix [00a27923ee]: text/entry dysfunctional when pasting an emoji on MacOSX. Thanks to Christian Werner. check-in: d0d3d91c user: fvogel tags: bug-00a27923ee
14:14 Add attachment clipboard.patch to ticket [00a27923] artifact: 2a4d572c user: chw
14:13 New ticket [00a27923] text/entry dysfunctional when pasting an emoji on MacOSX. artifact: d2648cb0 user: chw

Ticket UUID: 00a27923ee26437611e1ed83f96e15b6caabcd8b
Title: text/entry dysfunctional when pasting an emoji on MacOSX
Type: Bug Version: core-8-6-branch at least
Submitter: chw Created on: 2017-12-30 14:13:43
Subsystem: 52. [clipboard] Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Severe
Status: Closed Last Modified: 2021-01-27 11:28:27
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2021-01-27 11:28:27
If a string containing an emoji (or something beyond BMP) is pasted into a text or entry widget on MacOSX the entire toplevel may render unresponsive for unknown reasons.

The attached patch forces the incoming and outgoing clipboard information to be UTF-8 reformatted according to the direction of the information flow.

Although the (then to be observed) result of the paste operation is debatable the unresponsiveness can't be reproduced anymore.
User Comments: jan.nijtmans added on 2021-01-27 11:28:27:

This should be fixed now, for sure in 8.6.11.

chrstphrchvz added on 2019-11-11 01:04:53:

I had also opened a ticket which covers issues similar to this on any platform, not just Aqua: [a179564826].

The recent change [43e89771] may provide a more immediate workaround for the issue described here, by converting any non-BMP characters pasted to U+FFFD for the time being.

chrstphrchvz added on 2019-05-12 05:34:35:

Cc'ing myself since this looks tangentially related to my newer ticket [0a5853aa61]

jan.nijtmans added on 2018-01-10 09:15:24:

> Therefore the logic to deal with invalid stuff should be put in tkMacOSXFont.c (and possibly the other POSIX/Win32 interfaces, too).

Well, I think that the Mac and POSIX should get some functions like Windows already has: Tcl_WinUtfToTChar/Tcl_WinTCharTcharToUtf. For now, it's OK for Tk to provide those (your UtfToUTF16DString is a good start), but eventually those functions should be provided by Tcl. Or - maybe - a "ui_system" encoding should be provided by Tk, then we can use the already existing Tcl_ExternalToUtf/Tcl_UtfToExternal (just a thought, I'm not sure yet).

My first priority is to get it right for TCL_UTF_MAX=4, then it can go along with TIP #389 and people can expect Emoji's to work right starting with Tcl 8.7. The second priority is getting it as good as possible for TCL_UTF_MAX=3, then those additional fixes can be backported to 8.6. Finally, TCL_UTF_MAX=6 is a nice to have (since many more fixes in Tcl and Tk are needed for that, which androwish maybe already has but the core not yet).

The "\uFFFD" replacement implementation for TCL_UTF_MAX=3 is now in the "tip-389" branch (even though this branch is actually meant for TCL_UTF_MAX=4 only).

Thanks, Christian, as always!

chw added on 2018-01-09 19:54:26:
This morning I built from tcl (tip-389) and tk (bug-00a27923ee) using the three possible settings for TCL_UTF_MAX (3, 4, 6) and observed different looking contents in a text widget when inserting invalid or partial surrogates.

IMO the point should be (regardless what the Tcl core does), to try to render things (including invalid ones) on the display as identical as possible. Therefore the logic to deal with invalid stuff should be put in tkMacOSXFont.c (and possibly the other POSIX/Win32 interfaces, too).

jan.nijtmans added on 2018-01-09 11:25:56:

Yes, Christian, I see your point. Thinking more about it, this "\uFFFD" replacement should actually be Tcl's responsibility.


Does that also work for you?


chw added on 2018-01-09 06:00:33:
Let me explain the idea of the "\uFFFD" replacement in UtfToUTF16DString(): For any stray surrogate e.g. "AB\uD83DXY" or "\uDE03\uDE02" or "\uD83D\uD83D" a consistent presentation on screen should result independent of TCL_UTF_MAX at built time. For the case where the core supports BMP only, it is IMO better to replace even valid surrogate pairs with "\uFFFD" in order not to pretend to support the full unicode range.

Since this affects only the font measure/render step, it is presentation only and does not alter the original data.

jan.nijtmans added on 2018-01-08 12:16:11:
For TCL_UTF_MAX=3 and TCL_UTF_MAX=6, the patch looks OK. For TCL_UTF_MAX=4 certainly not, since Tcl then already works with surrogates internally, nothing special should be needed. Remark: Testing Tk with TCL_UTF_MAX=4 requires linking it with the "tip-389" branch of Tcl, which is not put up for voting yet.

Hopefully I fixed that now in the branch, but that should be tested again in combination with TIP #389.

fvogel added on 2018-01-07 20:41:53:
Thanks for this second patch. The first one did indeed not fix the hang with the three-liner terst case you provided just below.

This one does fix the hang, however it's quite a big patch that needs to be reviewed and understood.

Since I'm not really understanding it, may I ask Jan Nijtmans, who is deeply proficient in that TCL_UTF_MAX stuff, to review this and provide feedback?

chw added on 2018-01-02 10:30:13:
Here is a simpler test case:

 text .t
 pack .t
 .t insert end "123\uD83D\uDE02XYZ"

In my humble impression this problem now is a teenager.

The attached patch adds some more safety to the clipboard and provides a more robust conversion from UTF-8 to Utf16Char for text measurement and rendering which hopefully addresses all possible TCL_UTF_MAX variants.

fvogel added on 2018-01-01 19:23:08:
> I've used a terminal and the subwindow which allows to pick an emoji
> to be inserted into the terminal

Can't find this. On macOS I opened a terminal (this is actually a bash window), but then where is that subwindow full of emojis?

Anyway I trust you, so I'll merge.

chw added on 2018-01-01 16:01:39:
And BTW "\u1F600" is 0x1F60 0x0030 as UCS-2 so you'd need "\U1F600" for something beyond BMP. But this won't work either with a BMP only Tcl core.

chw added on 2018-01-01 15:53:36:
Sorry for not being precise on how I got at this problem. I've used a terminal and the subwindow which allows to pick an emoji to be inserted into the terminal, then copied the emoji from the terminal with the mouse and pasted it with the middle mouse button into a text or entry widget.

I doubt that you can produce a 4 byte UTF sequence using "event generate" thus cannot create a test case without further adding a test function which allows for producing 4 byte UTF sequences.

Maybe these pieces are helpful, too:

fvogel added on 2018-01-01 14:34:31:

I can't reproduce freezing the text widget (I mean: without the patch).

I tried on macOS with:

package require Tk
pack [text .t]
clipboard clear
clipboard append \u1F600  ; # smiling face emoji
event generate .t <<Paste>>
# typing in the text widget is working at this point

Can you be more specific about how you produce the problem?

Also, the result of the paste operation has nothing in common with the smiling face emoji I copied into the clipboard. It is in fact two characters, the first one looks like a lowercase omega greek letter with kind of an accent on top of it, and the second character looks like a standard zero. Perhaps a font issue.

chw added on 2017-12-31 13:58:00:
No idea, how a test case can exercise the changes. The idea of the changes is to enforce a proper UTF-8 representation of both input and output since the selection/clipboard is an interface to the outside world.

But now that we enforce proper outgoing format, we can't read back invalid data by creating our own dog foood. Thus, it would require a specific test support function to produce errors programatically.

And before the patch, I did not oversee the problem. It just seemed to freeze the text or entry widget forever without further consequences.

fvogel added on 2017-12-31 09:09:29:

Thanks for the report and even more thanks for the fix. I have applied it to branch bug-00a27923ee.

I have run the clipboard.test test file on macOS and see four failures clipboard-4.1, -4.2, -4.4 and -6.2. However these are also present in core-8-6-branch and look unrelated.

Could a new test case be created in association to this bug?