TIP 621: Glyph clustering for Tk

Bounty program for improvements to Tcl and certain Tcl packages.
    Author:        Jan Nijtmans <[email protected]>
    State:         Draft
    Type:          Project
    Vote:          Pending
    Created:       16-Mar-2022
    Tcl-Version:   8.7
    Keywords:      Tk, ICU
    Tk-Branch:     glyph_indexing_2


At this moment, Tk doesn't know anything about Unicode glyphs. This TIP implements an interface with ICU (or the Glyph API in MacOSX), such that Tk knowns what a glyph is and can handle it.


The following new functions are implemented in Tk. They all have two fixed parameters and an optional one.

    ::tk::startOfCluster string start ?locale?
    ::tk::endOfCluster string start ?locale?
    ::tk::startOfNextWord string start ?locale?
    ::tk::startOfPreviousWord string start ?locale?
    ::tk::endOfWord string start ?locale?
    ::tk::wordBreakAfter string start ?locale?
    ::tk::wordBreakBefore string start ?locale?

Here the string refers to the string being handled, start is the current position in this string, and the functions will return a new position in the string. If there is no such position (e.g. the end of the string is reached), the empty string is returned.

locale can be something like en or en_GB. For most locales it doesn't have any effect (Tk even doesn't use this in the current bindings). The ICU implementation handles the locale parameter, the MacOSX implementation currently doesn't.

Those functions are used in the Tk bindings of entries, spinboxes and text, and have the result that Delete deletes the full Glyph in stead of only a single Unicode character. And - for example - the <<NextChar>> and <<PrevChar>> virtual events (Cursor right/left) will jump to the next/previous Glyph in stead of the next/previous Unicode code point.

Also, especially for MacOS, there are new key modifiers "Fn" and "Num". "Fn" is equivalent to "Mod4" and can be used to make the "Fn+e" key combination usable to access Emoji. "Num" is equivalent to "Mod3" and can be used to access the extended numerical keys.

If ICU is not available, a minimal implementations of those functions are implemented. Those minimal functions only know about Unicode surrogates, nothing else.

On MacOSX, only ::tk::startOfCluster and ::tk::endOfCluster are implemented using the MacOSX API. The other ones fallback to tcl_startOfNextWord /tcl_startOfPreviousWord /tcl_endOfWord/tcl_wordBreakAfter/tcl_wordBreakBefore.

ICU knowns all about the first 5 functions in the list, only ::tk::wordBreakAfter/::tk::wordBreakBefore have a minimal implementation, just calling tcl_wordBreakAfter/tcl_wordBreakBefore.

Since X11, the MacOSX API and ICU all use UTF-16, while Tcl 9.0 uses UTF-32 internally, it turns out that Tk best is compiled with -DTCL_UTF_MAX=3, even with Tcl 9.0 which uses -DTCL_UTF_MAX=4.


On branch glyph_indexing_2


This document has been placed in the public domain.