Tk Source Code

View Ticket
Login
Ticket UUID: 4b555aca347a0729a10f9b8ba5350049ba2d10e8
Title: text: search -all hangs and eats all memory
Type: Bug Version: revised_text, core-8-6-branch, trunk
Submitter: fvogel Created on: 2018-10-11 18:50:40
Subsystem: 18. [text] Assigned To: fvogel
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2018-10-17 20:23:49
Resolution: Fixed Closed By: fvogel
    Closed on: 2018-10-17 20:23:49
Description:

Reported on comp.lang.tcl by Dave today:

Searching a text widget for all instances of the empty string will lock up tclsh or wish, and apparently try to eat all memory.

A very minimal example:

  package require Tk
  text .t
  .t search -all "" 1.0

It also happens with nonempty text widgets, and ones that are displayed. An error or empty list seems like a better response.

Tcl/Tk 8.6.6 on slackware (linux 4.4.14)

User Comments: fvogel added on 2018-10-17 20:23:49:
Merged to core-8-6-branch, trunk and revised_text.

fvogel added on 2018-10-16 21:27:47:
Thank you for having reviewed my patch for forward searches.

I have applied your patch regarding backward searches, this looks OK to me, and moreover all tests do now pass (including an added one which exercises the hang case you found with backwards search with -all and matching at start of line).

Looks like we have fixed together all known text widget search issues, that's great!

danckaert added on 2018-10-16 10:28:44:

BTW, I found that backwards search can hang even for non-empty string:

% text .t; .t insert end abc 
% .t search -all -backwards b end  ;# OK
1.1
% .t search -all -backwards a end  ;# hangs
If the search matches at the beginning of a line, it hangs.

Here is a patch which I think fixes backwards search:

diff --git a/generic/tkText.c b/generic/tkText.c
index 3fa56d2..cfb2336 100644
--- a/generic/tkText.c
+++ b/generic/tkText.c
@@ -5813,7 +5813,7 @@ SearchCore(
            firstOffset = 0;
        }

- if (alreadySearchOffset != -1) { + if (alreadySearchOffset >= 0) { if (searchSpecPtr->backwards) { if (alreadySearchOffset < lastOffset) { lastOffset = alreadySearchOffset; @@ -5902,17 +5902,17 @@ SearchCore( * match. */

- const char c = pattern[0]; + const char c = matchLength ? pattern[0] : '\0';

- if (alreadySearchOffset != -1) { + if (alreadySearchOffset >= 0) { p = startOfLine + alreadySearchOffset; alreadySearchOffset = -1; } else { p = startOfLine + lastOffset -1; } while (p >= startOfLine + firstOffset) { - if (p[0] == c && !strncmp(p, pattern, - (size_t) matchLength)) { + if (matchLength == 0 || (p[0] == c && !strncmp( + p, pattern, (size_t) matchLength))) { goto backwardsMatch; } p--; @@ -6075,7 +6075,10 @@ SearchCore( if (firstNewLine != -1) { break; } else { - alreadySearchOffset -= matchLength; + alreadySearchOffset -= (matchLength ? matchLength : 1); + if (alreadySearchOffset < 0) { + break; + } } } else { firstOffset = p - startOfLine + matchLength;


danckaert added on 2018-10-16 09:19:03:

Yes, that fix looks correct to me.


fvogel added on 2018-10-15 19:02:18:

3rd fix proposal, for forwards searches only at the moment: [490b0a1b09]

Koen, if you can have a look I would appreciate your review. Always better to have several pairs of eyes on this. The algorithm for searching is not so straightforward with all the options it features.


fvogel added on 2018-10-15 18:23:59:

Oh yes, I agree of course, my second proposal is not the correct fix.

Not even talking about the code there would be quite something to say about backwards searching for the empty string in a text widget. Look at the following:

  package require Tk
  text .t
  # empty widget
  .t search "" 1.0                           ; # returns 1.0 (correct)
  .t search -backwards "" 1.0                ; # returns "" (WRONG)
  .t search -all "" 1.0                      ; # hangs (VERY WRONG)
  .t search -backwards -all "" 1.0           ; # no hang, returns "" (WRONG)
  .t search -backwards -all -overlap "" 1.0  ; # no hang, returns "" (WRONG)
  .t search -backwards "" 1.0                ; # returns "" (WRONG)
  .t search -backwards -regexp "" 1.0        ; # returns 1.0 (correct)
  .t search -backwards -all -regexp "" 1.0   ; # returns 1.0 (correct)
  # non-empty widget
  .t insert end abc
  .t search "" 1.0                           ; # returns 1.0 (correct)
  .t search -backwards "" 1.0                ; # returns "" (WRONG)
  .t search -all "" 1.0                      ; # hangs (VERY WRONG)
  .t search -backwards -all "" 1.0           ; # no hang, returns "" (WRONG)
  .t search -backwards -all -overlap "" 1.0  ; # no hang, returns "" (WRONG)
  .t search -backwards "" 1.0                ; # returns "" (WRONG)
  .t search -backwards -regexp "" 1.0        ; # returns 1.3 (correct)
  .t search -backwards -all -regexp "" 1.0   ; # returns {1.3 1.2 1.1 1.0} (correct)

So it appears that the backwards exact search for the empty string is broken as well (differently than the forward search since at least it does not hang). Regexp searches (forwards and backwards) for the empty string look correct to me.

As you have probably seen in the source code, exact and regexp search engines are completely different cases in function SearchCore().

I'm more and more inclined to follow your suggestion after all and to return an error for the case of exact searches for the empty string. Regarding forward searches this would be an acceptable thing to do (since current code hangs, nobody can rely on it). But regarding backwards searches, even if the returned answer is currently wrong, returning an error would be a potential incompatibility. Perhaps we can decide this is a a non-issue since there is no point in an exact search for an empty string (except perhaps as a trick to return all indices in the widget?), and we anyway need to plug the potential crash that pattern[0] currently offers.


danckaert added on 2018-10-15 15:29:03:

That cannot be a correct fix, since the "strncmp()" functions should use the original matchlength. Furthermore, it will report incorrect "counts" now:

  % text .t; .t insert end abc
  % .t search -all -count C "" 1.0 end
  1.0 1.1 1.2 1.3
  % set C
  1 1 1 1
The latter should obviously be "0 0 0 0"

BTW, I see that for backwards searches, the code contains:

     const char c = pattern[0];
which may even cause crashes when searching backwards for an empty string.

Maybe it's not so bad to report an error when doing an exact search for the empty string, after all. That's better than an infinite loop, and it is also better than reporting just a single match (because that would be wrong).


fvogel added on 2018-10-15 14:35:49:

That is a very good point, thanks for having shared.

I have reverted the originally proposed fix (that was [80286abf]), have added more tests [4f4b509cab], and have proposed another fix [e1002c8556].

Despite this second fix passes all tests, including the newly added ones, I'm not so comfortable with merging this, because there are comments like this one, which become false with this fix... I should probably look closer at the issue.


danckaert added on 2018-10-15 09:19:50:

Note that it worked correctly with the -overlap or -regexp switches:

  % text .t ; .t insert end abc
  % .t search -all -overlap "" 1.0    
  1.0 1.1 1.2 1.3
  % .t search -all -regexp "" 1.0
  1.0 1.1 1.2 1.3

I'd say that this is the correct result, since there is an empty string at every index. I'd prefer to get this result also without -overlap/-regexp.

Note that the proposed patch modified the current behavior with -overlap.


fvogel added on 2018-10-11 20:16:38:

I think I have found what was wrong in the search algorithm, see my proposed fix [80286abf]. This fix passes all tests from the test suite.

The problem only happened with -all (and -exact, the default). It did not happen without -all, or with -all -regexp.

Regarding the expected result of .t search -all "" 1.0, I decided to return the same result as .t search "" 1.0. This seemed more logical to me than returning an error or the empty string. Indeed search results with the -all switch should contain search results without this switch, isn't it?

I have added four test cases in the test suite, that will be testing some variations of the present ticket, see [be75dbc5].


fvogel added on 2018-10-11 18:51:21:
Reproduced on Windows, with both the core-8-6-branch (legacy text widget) and revised_text branch (revised version).