Author: Jan Nijtmans <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 16-Mar-2022
Tcl-Version: 8.7
Keywords: Tk, ICU
Tk-Branch: glyph_indexing_2
Vote-Summary Accepted 3/0/1
Votes-For: FV, JN, KW
Votes-Against: none
Votes-Present: KBK
Abstract
At this moment, Tk doesn't know anything about Unicode glyphs. This TIP implements an interface with ICU (or the Glyph API in MacOSX), such that Tk knowns what a glyph is and can handle it.
Specification
The following new functions are implemented in Tk. They all have two fixed parameters and an optional one.
::tk::startOfCluster string start ?locale? ::tk::endOfCluster string start ?locale? ::tk::startOfNextWord string start ?locale? ::tk::startOfPreviousWord string start ?locale? ::tk::endOfWord string start ?locale? ::tk::wordBreakAfter string start ?locale? ::tk::wordBreakBefore string start ?locale?
Here the string
refers to the string being handled, start
is the current
position in this string, and the functions will return a new position in the
string. If there is no such position (e.g. the end of the string is reached),
the empty string is returned.
locale
can be something like en
or en_GB
. For most locales it doesn't
have any effect (Tk even doesn't use this in the current bindings). The
ICU implementation handles the locale
parameter, the MacOSX
implementation currently doesn't.
Those functions are used in the Tk bindings of entries, spinboxes and text,
and have the result that Delete
deletes the full Glyph in stead of only
a single Unicode character. And - for example - the <<NextChar>>
and
<<PrevChar>>
virtual events (Cursor right/left) will jump to the
next/previous Glyph in stead of the next/previous Unicode code point.
Also, especially for MacOS, there are new key modifiers "Fn" and "Num". "Fn" is equivalent to "Mod4" and can be used to make the "Fn+e" key combination usable to access Emoji. "Num" is equivalent to "Mod3" and can be used to access the extended numerical keys.
If ICU is not available, a minimal implementations of those functions are implemented. Those minimal functions only know about Unicode surrogates, nothing else.
On MacOSX, only ::tk::startOfCluster
and ::tk::endOfCluster
are
implemented using the MacOSX API. The other ones fallback to tcl_startOfNextWord
/tcl_startOfPreviousWord
/tcl_endOfWord
/tcl_wordBreakAfter
/tcl_wordBreakBefore
.
ICU knowns all about the first 5 functions in the list, only ::tk::wordBreakAfter
/::tk::wordBreakBefore
have a minimal implementation, just calling tcl_wordBreakAfter
/tcl_wordBreakBefore
.
Since X11, the MacOSX API and ICU all use UTF-16, while Tcl 9.0 uses UTF-32 internally,
it turns out that Tk best is compiled with -DTCL_UTF_MAX=3
, even with Tcl 9.0 which
uses -DTCL_UTF_MAX=4
.
Implementation
On branch glyph_indexing_2
Copyright
This document has been placed in the public domain.