TIP 687: locale support for word handling in text and entry

Login
Author:		Jan Nijtmans <[email protected]>
State:		Draft
Type:		Project
Vote:		Pending
Created:	30-1-2024
Tcl-Version:	9.1
Tk-Branch:	tip-687
Vote-Summary:	
Votes-For:	
Votes-Against:	
Votes-Present:	

Abstract

This TIP proposes to add the -locale option to the text and entry widget and it's derivatives (so ttk and spinbox as well). This option can be used to change the algorithm used in virtual events <<NextChar>>, <<PrevChar>>, <<NextWord>>, <<PrevWord>> (and it's variants like <<SelectNextChar>>). The locale will be handed to the ICU library, so it can adapt the char- or word-division algorithm to the specific locale.

If the locale is "", the ICU default locale is used. A special locale "regexp" can be used to fall-back to a locale-independent algorithm, using a regular expression.

Text tags have the new -locale option too. There's also a new method [$text locale index] which can be used to determine which locale is set for a specific text index.

Rationale

This TIP started with a ticket, requesting to make the word break algorithm switchable: The ICU method of char-/word-division is designed for natural languages, less for programming languages.

The difference can be noted - for example - in the word aujourd'hui. With the ICU algorithm, this is a single word. Using the "regexp" locale, it's split into two parts.

Implementation

Implementation is in Tk branch "tip-687".

Copyright

This document has been placed in the public domain.