Tk Source Code

View Ticket
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Ticket UUID: 1096580
Title: soft-hyphen in (legacy) text widget
Type: Bug Version: obsolete: 8.4.9
Submitter: nobody Created on: 2005-01-05 17:26:06
Subsystem: 18. [text] Assigned To: fvogel
Priority: 4 Severity: Minor
Status: Open Last Modified: 2017-08-09 19:10:39
Resolution: None Closed By: nobody
    Closed on: 2017-08-09 16:39:43
Description:
Soft-hyphens '\u00AD' are not properly handled in text
widgets.
They are always shown as '-' (hyphen) but a soft-hyphen
should only appear if it is the last character in a line.

I.E
"abc\u00AD123"  should be displayed as

abc123

or

abc-
123

but is always displayed as
abc-123

Tested on win/linux
8.4.6 / 8.4.7 / 8.5a2
User Comments: fvogel added on 2017-08-09 19:10:39:
> This is resolved with the new hyphen handling in 'revised_text'.

Yes it is. But please let's keep this ticket open until revised_text branch gets merged in trunk. When this will happen, I promise to take care to close this ticket. If for any reason merging would not happen, then we would still have a ticket for this missing feature of the legacy text widget.

bll added on 2017-08-09 16:39:43:
This is resolved with the new hyphen handling in 'revised_text'.

bll added on 2017-02-06 01:29:25:
A thought for the future.

As a *future* project, whenever the hyphenation breaks or unbreaks a line, a procedure should be called that would allow the user to modify the display.

This would allow the user to change the display characteristics of the trailing end and the leading end (next line) and also to change the spelling of the words if necessary.

This would provide the support needed for soft hyphen breaks for all languages.

fvogel added on 2017-02-05 21:54:26:

It's easy to replace soft hyphens located at the end of display lines by hard hyphens, so that they are rendered, this is [64d631fe] (for OS X only - on other platforms this is not needed because there is a glyph corresponding to the soft hyphen).

Looks better this way, don't you think so?


fvogel added on 2017-02-05 17:04:51:

I believe I have fixed through [4ab62e40] the rendering issue that happened on OS X only, and that can be seen with the provided snapshots, or more simply with the following simple script:

package require Tk
pack [text .t -wrap word] -fill both -expand true
.t insert end "abc\u00ad123\n"

That is, there should no longer be strange characters appearing on screen near the places where soft hyphens are present. There was a buffer overrun happening, on OS X only.

Nevertheless, soft hyphens are still not displayed on screen on the OS X platform (whereas on Linux and Win they are), when they are the last character of the display line. This is because the fonts used on the Mac apparently do not have a glyph corresponding to the soft hyphen.

Litterature (pointers that follow were provided by Brad Lanam, thanks!) explains that several different behaviours would be correct regarding whether the soft hyphen displays or not as a hard hyphen when it's at the end of the display line:

https://github.com/jquast/wcwidth/issues/8

https://www.cs.tut.fi/~jkorpela/shy.html

http://www.unicode.org/L2/L2002/02279-muller.htm (mainly section 4)

What is currently in the source code is that the soft hyphen will render as a hard hyphen if and only if it appears at the end of a display line AND, of course, if the font used has a glyph for this character (code 173, i.e. 0x00AD). Otherwise the soft hyphen will not render. The same behavior is coded for all platforms and I'm not aware of any bugs as far as the text widget is concerned.

Must now look at what needs to be done for other widgets than the text widget.


anonymous (claiming to be [email protected]) added on 2016-11-08 23:17:16:
Ok.  New images.

8.6.6 release
https://gentoo.com/tcl-1096580/macosx-8.6.6-2016-11-8.png
core-8.6 branch
https://gentoo.com/tcl-1096580/macosx-3cdcb21b-2016-11-8.png
late softhyphen branch
https://gentoo.com/tcl-1096580/macosx-40525043-2016-11-8.png
early softhyphen branch
https://gentoo.com/tcl-1096580/macosx-d698107f-2016-11-8.png

fvogel added on 2016-11-03 22:24:42:

Indeed there are problems on OSX, thanks for the snapshots.

But even in core-8-6-branch the behavior on OSX is not the same as on linux or Windows:

- On Win and Linux, soft hyphens are not distinguishable from normal hyphens. They are displayed the same as normal hyphens.
- On OS X, soft hyphens are apparently simply not displayed.

How strange! Reasons for this difference are kind of mystery to me...


anonymous (claiming to be [email protected]) added on 2016-11-03 16:17:03:
Images for macosx:

https://gentoo.com/tcl-1096580/macosx-40525043-a.png
https://gentoo.com/tcl-1096580/macosx-40525043-b.png
https://gentoo.com/tcl-1096580/macosx-8.6.6-a.png
https://gentoo.com/tcl-1096580/macosx-8.6.6-b.png
https://gentoo.com/tcl-1096580/linux-40525043-a.png

anonymous (claiming to be [email protected]) added on 2016-11-03 16:10:09:
Demo looks great on linux.

I don't think it is working properly on Mac OS X.
I have images.  I will see if I can attach to the ticket.  If not, I will post some URLs.

fvogel added on 2016-11-03 07:24:12:
OK let's focus on OS X.

I guess your results mean my changes for the present ticket are OK for you on OS X, which is the main feedback I expected from you.

Did you also try the demo script I have provided below on 2016-10-16 (section IV)?

I'll let others comment / test for a few days (on any platforms) and then I'll merge.

Aside of this, could you please nevertheless:
  - add your tests results for Mac OS X in the following wiki page for the records: http://wiki.tcl.tk/37529
  - send me (private email) the complete output of the test suite results for you on Mac OS X. I didn't expect so many failures, I'm intrigued...

anonymous (claiming to be [email protected]) added on 2016-11-02 20:18:10:
(1) 2.31-2.36 have no failures.
(2) My linux failures are different and do not relate to 2.31-2.36.
(3) Many test failures on mac os x similar to this:
---- Result was:
1.0 {5.0 4.0 3.0 2.0 1.0} {borders 1.0 2.0 3.0 4.0 5.0 8.0 eof}
---- Result should have been (exact matching):
1.0 {5.0 4.0 3.0 2.0 1.0} {1.0 2.0 3.0 4.0 5.0 eof}
(4) These failures on mac os x do not seem to be of much importance, excepting possibly 11.13.

fvogel added on 2016-11-02 19:20:52:

Hmmm... let's try to sort all this out.

1. Tests specifically targeted to check the new feature of the present ticket are textDisp-2.31 to 2.36. Do these tests fail for you on OS X ?

2. If no test textDisp-2.31 through 2.36 fail, and there is no new failure on the other tests compared to the situation you have when running the test suite in the core-8-6-branch, then it means my changes are OK.

a. It is the case on Windows (Vista), I have checked it. No test fails at all.
b. It is the case on Linux Debian 8 as well, I have checked it. For me only textDisp-16.25 fails (in both branches: core-8-6-branch and bug-1096580fff).
c. From what you report I cannot tell whether it is the case for you or not on OS X?

3. "Many of the early tests are missing the word "borders": I don't understand this statement.

4. If item 2.c above is OK for you, then the 40 failing tests on OS X are likely unrelated to the changes I made to fix the present ticket. In this case I'm still interested in the tests failures but these should be recorded in a new ticket if they are deemed important. Perhaps start by directly emailing me with the complete test suite results on OS X and I can have a quick look.


anonymous (claiming to be [email protected]) added on 2016-11-02 17:55:43:
Ok.  Sorry, used to running my own application's tests directly. 

Linux fails textDisp:13.11 and textDisp:19.18.  I tried with tk scaling at 1.25 in a .wishrc.  (13.11, 19.17 and 19.18 fail with tk scaling set correctly to 1.4).
13.11: fails with result 1 instead of 0
19.18: 280 280 vs 70 70
You have a Linux VM, right?  You can probably check this.

Mac OS X: 40 failures...let me go hook up a real monitor (old! 72dpi).
Still 40 failures.  
Let me try with X11. Built with --disable-aqua. Still 40 failures.

Many of the early tests are missing the word "borders", so they are probably ok.
11.13 fails: 1.0 {4.0 5.0} vs 1.0 5.0.
14.13/14.14 off by .01
15.8 off by 1
18.6 off by .5
19.11.17/20/21/23/24 off by .1

So actually looks pretty good.

fvogel added on 2016-11-02 07:23:02:

Sorry for the compile error, that was my copy/paste mistake.

However the change you made is not the right one. Please fossil update again in the same branch. Commit [40525043] should fix the compile error for Mac OS X.

Regarding the test suite, it looks like you seem to be doing something wrong indeed.

1. In the Tk source tree, tests/README points to tests/README in the Tcl directory. That is not a circular reference. Did you mean something else circular?

2. "Error in startup script: invalid command name "deleteWindows"" makes me wonder how you launch the test suite. The correct way is:

a. compile and install by make ; make install in the unix directory of Tk
b. run tests by make test in the same directory

This will launch the comple test suite for Tk. However, for the specific purpose of the present ticket, it is enough, at least as a starting point, to run the tests from "textDisp.test" test file. This can be done through: make test TESTFLAGS="-file textDisp.test"

Thanks again for your help!


anonymous (claiming to be [email protected]) added on 2016-11-02 00:26:15:
Linux does better than Mac OS X:

So is there are particular setup I need to run for Tk?
tests/README has a circular reference :).

==== textDisp-13.11 TkTextSeeCmd procedure FAILED
==== textDisp-13.11 FAILED
==== textDisp-19.11.17 TextWidgetCmd procedure, "index +displaylines" FAILED
==== textDisp-19.11.17 FAILED
==== textDisp-19.11.18 TextWidgetCmd procedure, "index +displaylines" FAILED
==== textDisp-19.11.18 FAILED
==== textDisp-19.11.20 TextWidgetCmd procedure, "index +displaylines" FAILED
==== textDisp-19.11.20 FAILED
==== textDisp-19.11.21 TextWidgetCmd procedure, "index +displaylines" FAILED
==== textDisp-19.11.21 FAILED
==== textDisp-19.11.22 TextWidgetCmd procedure, "index +displaylines" FAILED
==== textDisp-19.11.22 FAILED
==== textDisp-19.11.23 TextWidgetCmd procedure, "index +displaylines" FAILED
==== textDisp-19.11.23 FAILED
==== textDisp-19.17 count -ypixels with indices in elided lines FAILED
==== textDisp-19.17 FAILED
==== textDisp-19.18 count -ypixels with indices in elided lines FAILED
==== textDisp-19.18 FAILED

And both Mac OS X and Linux have this (am I not starting the tests correctly?):

Error in startup script: invalid command name "deleteWindows"
    while executing
"deleteWindows"
    (file "textDisp.test" line 4303)

anonymous (claiming to be [email protected]) added on 2016-11-02 00:17:27:
Mac OS X:

.../Tk_Source_Code-2e6a42fb/unix/../generic/tkTextDisp.c
/Users/bll/tcl/Tk_Source_Code-2e6a42fb/unix/../generic/tkTextDisp.c:7705:49: error:
  use of undeclared identifier 'numBytes'; did you mean 'nBytes'?
  nBytes = TkUtfToUniChar(Tcl_UtfPrev(p + numBytes, p), &ch);

I changed s/numBytes/nBytes/ and it compiles (though I don't know if that's a correct change).

Question is, how do I set up for testing Tk on Mac OS X without generating many test failures.  Many failures for textDisp.test.  

I'll try it out on Linux.

fvogel added on 2016-11-01 21:44:00:

I have now committed in [ea75c23f] a revision of tkTextDisp.c in branch bug-1096580fff that I think fixes the compilation issues reported below by Brad on OS X. Brad, would you please be kind enough to confirm?

Besides, I have finally decided in commit [2e6a42fb] that soft hyphens always have zero width in their bounding boxes even if they are accidentally displayed (i.e. when they are located at the end of a display line). This seems Good Enough (TM) to me. The test suite now runs 100% OK for me. Please confirm the test suite results are also OK for you.


fvogel added on 2016-10-22 18:32:57:
Tanks for your check Brad, I see what's wrong and will fix this. Stay tuned!

anonymous (claiming to be [email protected]) added on 2016-10-19 20:09:55:
Mac OS X:
(stupid makefile ignores --prefix on Mac OS X (now I have to figure out how to clean up its install), but I checked and it is including the 8.6.6 tcl.h).

[...] -std=gnu99 -x objective-c -DTK_FRAMEWORK_VERSION=\"8.6\" -DUSE_TCL_STUBS /Users/bll/tcl/Tk_Source_Code-1516a46a/unix/../generic/tkTextDisp.c
/Users/bll/tcl/Tk_Source_Code-1516a46a/unix/../generic/tkTextDisp.c:1489:31: error: character too large for enclosing character literal type
                    if (*p == '\u00AD') {
                              ^
/Users/bll/tcl/Tk_Source_Code-1516a46a/unix/../generic/tkTextDisp.c:7705:65: error: character too large for enclosing character literal type
    } else if (ciPtr->numBytes > 1 && p[ciPtr->numBytes - 1] == '\u00AD') {
                                                                ^
/Users/bll/tcl/Tk_Source_Code-1516a46a/unix/../generic/tkTextDisp.c:7731:36: error: character too large for enclosing character literal type
                    case '-': case '\u00AD':
                                   ^
/Users/bll/tcl/Tk_Source_Code-1516a46a/unix/../generic/tkTextDisp.c:7964:58: error: character too large for enclosing character literal type
            if ((len > 1) && (string[start + len - 1] == '\u00AD')) {
                                                         ^
/Users/bll/tcl/Tk_Source_Code-1516a46a/unix/../generic/tkTextDisp.c:8636:27: error: character too large for enclosing character literal type
                if (ch == '\u00AD') {
                          ^
/Users/bll/tcl/Tk_Source_Code-1516a46a/unix/../generic/tkTextDisp.c:8672:23: error: character too large for enclosing character literal type
            if (ch == '\u00AD') {

fvogel added on 2016-10-16 20:57:47:

I have proposed a fix for this bug, see [1516a46a81].

I. Approach I used for solving this bug, and status

1.
In LayoutDLine(), see if there is soft hyphen in the current segment; if so, only layout characters up to (and including) this character. That means that if there is a soft hyphen in a text segment then it will always be the last character of a chunk (chunks are produced from segments in TkTextCharLayoutProc()).
This is the same approach as already done in the text widget for tabs (a tabs is always the last character of a chunk).
2.
TkTextCharLayoutProc() computes chunk data from segment information. In doing that, it includes the soft hyphen in the number of "bytes that fit" (in some allotted horizontal space on screen). However the pixel width on screen of the soft hyphen is always zero (this is done in MeasureChars()), regardless of the position of the chunk, i.e. regardless of whether the chunk is the last of the display line or not.
3.
Possible break locations include soft hyphens (\u00AD) in TkTextCharLayoutProc()
4.
When rendering a display line on screen, CharDisplayProc() actually draws the soft hyphen only if it is the last character of a continuing display line.

II. Testing status with today's commit

A.
No previously existing test fail.
Several new tests have been added to the test suite to check for the behaviour of the text widget with soft hyphens. All these tests pass (for me on Windows), except textDisp-2.34.
This latter one checks the width (in its bounding box) of a soft hyphen when it is visible on screen, i.e. when it is actually dislayed because it's the last character of a continuing display line. Currently the width of a soft hyphen is always zero, as explained above in 2., which makes this test fail. To fix this is not trivial at all. One needs to tell chunks containing soft hyphens to increase their width field if and only if these chunks terminate a continuing display line.

--> I'm wondering if one could not just leave things as they are now and decide that it's good enough, in other words that soft hyphens always have zero width in their bounding boxes even if they are accidentally displayed. Your thoughts?

B.
My fix is currently uncompiled and untested in the case where TK_LAYOUT_WITH_BASE_CHUNKS and TK_DRAW_IN_CONTEXT are defined. This seems to happen on OSX only.

--> Could someone having an OSX platform please run the test suite (textDisp.test file is enough) in branch [bug-1096580fff] and report results here?

III. Peer review

Not sure I did not mix bytes and chars.

--> Peer review of the fix requested, thanks...!

IV. Demo script

package require Tk
pack [text .t -width 60 -height 5 -wrap word -font fixed \
        -insertofftime 0] -fill both -expand true
.t insert end "Now this is a sample soft-hyphen abc\u00AD123 text.\n"
.t insert end "Now this is a sample soft-hyphen abc\t123 text.\n"
.t insert end "Very good large line Test\twith soft\u00ADhyphen and\ttabs.\n"
.t insert end "Very good large line Test\u00ADwith soft\thyphen and\u00ADtabs.\n"
# Change window width at will with the mouse
# Notice how soft hyphens show up and disappear from the screen
# depending on the window width


fvogel added on 2016-10-01 12:33:15:
Second improvement now commited in branch [bug-1096580fff]: in '-wrap word' mode the text widget wraps on soft hyphens as well.

fvogel added on 2016-10-01 10:04:07:

Started a branch for this bug, see [bug-1096580fff].

First improvement is that now in '-wrap word' mode the text widget wraps on ordinary hyphens in addition to whitespace.


vincentdarley added on 2005-02-08 20:39:32:
Logged In: YES 
user_id=32170

Note, even beyond this bug, shouldn't "word wrapping"
actually wrap at ordinary hyphens as well?  Currently all of
Tk's word wrapping wraps only at whitespace.

vincentdarley added on 2005-01-11 18:21:26:
Logged In: YES 
user_id=32170

I should add that my previous comment refers to where the
fix for word-wrapping at a soft-hyphen should go.

To make soft-hyphens not display when they're not at the end
of a line will require more changes both earlier on in the
same TkTextCharLayoutProc (or more likely in the helper
routine MeasureChars), and in CharDisplayProc. These
additional changes are non-trivial.

vincentdarley added on 2005-01-11 18:16:11:
Logged In: YES 
user_id=32170

The bug in question is around here:

    if (wrapMode != TEXT_WRAPMODE_WORD) {
chunkPtr->breakIndex = chunkPtr->numBytes;
    } else {
for (count = bytesThatFit, p += bytesThatFit - 1; count > 0;
count--, p--) {
    if (isspace(UCHAR(*p))) {
chunkPtr->breakIndex = count;
break;
    }
}

in tkTextDisp.c (line 6668 onwards).  That 'isspace(UCHAR)'
needs to be made utf aware and look for special characters
like \u00AD.

Code contributions (and new tests) appreciated.  This is not
on my immediate to-do list.

dkf added on 2005-01-06 16:03:16:
Logged In: YES 
user_id=79902

Issue originally raised on comp.lang.tcl

The word-break algorithm is also wrong ('-wrap word'
considers \u00AD to not be a potential break point)

This is probably also broken in multiline labels/buttons.