Tk Source Code

View Ticket
Login
Ticket UUID: 168f3ef13091b8b2dbc1b8081e22e41df60cc244
Title: Going on words with Ctrl+arrow in text widget
Type: Bug Version: 9.0
Submitter: anonymous Created on: 2024-01-25 13:39:57
Subsystem: 18. [text] Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Important
Status: Open Last Modified: 2024-04-13 18:51:45
Resolution: None Closed By: nobody
    Closed on:
Description:
Tk 9.0. Linux.

Try with the code snippet below:
  Ctrl+RightArrow
  Ctrl+LeftArrow
to go on starts/ends of words.

In Tk 8.6 it works properly.

In Tk 9.0 it doesn't. The caret is set in most weird way.

  #_______________________

  package require Tk

  pack [text .txt -height 10 -width 50]

  .txt insert 1.0 {

    set ::var1 value!!!!!

    set ::var2 "$value + $value2"

  }
  focus .txt

  #_______________________
User Comments: jan.nijtmans added on 2024-04-13 18:51:45:
> I think it's a good idea to make it switchable. How about "-locale" in stead? 

Unfortunately, there is not enough support in the TCT to get this into 9.0. I'll try again for 9.1

jan.nijtmans added on 2024-01-27 00:31:01:

Extra ticket [done], thanks!

> And these are caret stops in Tk 9.0...

I think the solution [b4a14597563ed016|here] works quite well for this ticket as well. Please try it. The stops for Ctrl+Right are now the same as for Ctrl+Left on all platforms


oehhar added on 2024-01-26 11:17:40:

Could you file an extra ticket precisely about the Tk 8.6 platform differences of Ctrl+right ? Thanks, Harald


anonymous added on 2024-01-26 11:10:07:
Hi Jan,

The "-locale" option looks a lot more comprehensible and flexible. Maybe, system locale by default?

Also, I note that there are no standard Ctrl+arrow behavior among various editors.
Someones regard all starts and ends of sequences of (non)word chars, others only starts.
Someones stop at empty lines too, other ignore them.
The same with line ends.
Someones stop differently for Ctrl+Left and Ctrl+Right while others uniformly.

That's probably a choice of authors, i.e. their tastes.

But as for Tk 8.6, it is strange that it behaves differently in Windows and Linux:

  - in Windows: all caret stops are only at spaces before non-space char,
    at that special chars are treated as word chars;
    i.e. in Windows Tk considers only spaces as word dividers

  - in Linux: spaces and special chars are treated as word dividers

This inconsistency is a bit annoying, if one uses an app on both platforms.

I don't khow how's that with Mac OS.

Hopefully, in Tk 9.0 this discrepancy between platforms will go.

jan.nijtmans added on 2024-01-26 09:48:39:
Harald wrote:
> I would love, if this would exclude the separating space.

I'm also puzzled about this extra space ... There might be a bug here.

jan.nijtmans added on 2024-01-26 08:41:44:

I think it's a good idea to make it switchable. How about "-locale" in stead? If set to a special value (like "legacy" or "simple"), ICU can be switched off for those bindings. If "-locale" is set to a real locale, ICU can use to improve the algorithm depening on the used locale.


anonymous added on 2024-01-26 04:49:34:
Hi Harald,

You'd rightly noted: "It is clear, that the old method is more programmer like".

Maybe, it would make sense to have -ICU option of text widget (default "-ICU 1"), so that the texts supposed to be edited by programmers could be granted the old good Tk 8.6 functionality?

oehhar added on 2024-01-25 17:37:08:

Hi, thanks for the contribution. I think, your contribution is a constructive proposal. Perhaps, we have to look into ICU and why this happens. We are in Beta and any discussion is appreciated !

Thank you ! Harald


anonymous added on 2024-01-25 17:30:25:
Well, if you just compare how this works in some editors, you'll see this behavior:

(using | as stops of the caret)


Ctrl+Right
  |set |::|var1 |value|!!!!!|

  |set |::|var2 |"$|value |+ |$|value2|"|

Ctrl+Left
  |set |::|var1 |value|!!!!!|
    
  |set |::|var2 |"$|value |+ |$|value2|"

These stops are different from Tk 8.6's, but anyhow they are coherent and predictable.

#_______________________

And these are caret stops in Tk 9.0:

Ctrl+Right
  set| :|:var1| value|!!|!!|!|

  set| :|:var2| "|$value| +| $|value2"|


Ctrl+Left
  |set |::|var1 |value|!!|!!|!

  |set |::|var2 |"$|value |+ $|value2|"

which are not only non-coherent, but also quite weird in both cases.

See e.g. those |!!|!!|!| and :|:.

ICU algorithm is great, but its implementation in Tk 9.0 is not so, as it seems.

oehhar added on 2024-01-25 14:59:04:

Hi ! Thanks for the contribution. This is a new exciting feature and we should discuss what is the best solution! So, I may give your contribution as a "start of discussion". I suppose, the power of the new method will be visible for foreign character sets etc. where the former solution is just poor. In addition, it should be remarked, that the string and regexp view of a word boundary is now different to Tk. In a longer term, ICU should be in TCL and TCL methods should be used for this.

For me, the new stop points are marked with "|" in the following sketch. The old ones are marked with "'".


    |set '|::var1 '|value!!|!!!
|
    '|set '|::var2 |"$value '|+ '|$value2"
|
'
</verbatm>

So, before it was basically "non-space after a space".
The new rule finds more points.

I would personally also love to have the "mark whole word" functions to be elaborated:

By pressing and holding shift, and then right or left, the next word plus the separating space is marked. I would love, if this would exclude the separating space.

Then, you can make a double-click on a word to mark it.
The old method marks the word between spaces.
The new one does include characters and numbers, but does exclude '":$'.

It is clear, that the old method is more programmer like. The new one is closer to natural speeking. It recognizes multiple word ending characters "!" as such.

And I suppose, the version and parameters of the used icu libary may have influence.
I would love the comment by countries having very other languages and other sense of what a word is like chinese, japanese or Corean language.
I suppose, jsut "spearate on space" will not help there.

I think, this is a step in the right direction.

Thank you and take care,
Harald
</p>


jan.nijtmans added on 2024-01-25 14:18:54:

Starting with Tk 8.7, the ICU library is used to do word boundary search (which is what Ctrl-(Left|Right)Arrow do). ICU is implementing the algorith described here, which should be more intuitive (but different) than the simple algorithm used in Tk 8.6

Hope this helps, Jan Nijtmans