# TIP 113: Multi-Line Searches in the Text Widget
Author: Vince Darley <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 11-Oct-2002
Post-History:
Tcl-Version: 8.5
-----
# Abstract
This TIP proposes enhancing the implementation of the _$textwidget
search_ subcommand to allow matching of both exact strings and regexp
patterns which span multiple lines, and to allow reporting on all matches.
# Proposal
If the string/pattern given to the _search_ subcommand contains
sub-strings/patterns which match newlines, then it will be possible
for that command to return a match which spans multiple lines. Where
a match could occur both within a single line and across multiple
lines, the first such match will be found, and the length of the match
will follow the usual regexp rules, as documented in the regexp man
page. Since the text widget is inherently a line-based container of text, regexp searches will implicitly use the regexp _-line_ functionality so that _^_, _$_ matching beginning and end of any line, and _._, _[^_ will not match a newline \(but see the _-nolinestop_ flag to turn off the latter behaviour\).
This can be implemented very efficiently, given
the _TCL\_REG\_CANMATCH_ flag supported by the regexp library, with no
impact at all on the speed of matching single lines.
In addition, two new options to the _search_ subcommand are available:
If the new _-all_ option is given to the _search_ subcommand, then all matches within the given range will be reported. This means the return result of the command will be a list of indices, and, if a _-count var_ option was given, _var_ will be set to a list of match-index-lengths.
If the new _-nolinestop_ option is given then regexp searches will allow _._ and _[^_ sequences to match newline characters \(which is normally not the case\). This is equivalent to _not_ providing the _-linestop_ flag to Tcl's _regexp_ command.
The text widget man page will be updated to reflect the new _-all_ and _-nolinestop_ options, and to remove the "single line" caveat.
# Reference implementation
This is available from:
<http://sourceforge.net/tracker/?func=detail&aid=621901&group\_id=12997&atid=312997>
The patch includes objectification of the entire Text widget, so the multi-line search changes are not obvious to isolate. In fact the changes required are < 100 lines of code \(given that the rest has been objectified, that is\). Of course one nice side-effect of objectification is that regexp objects used in searches are actually cached, which they previously couldn't be.
Note: this patch has to workaround a crashing bug in Tcl's unicode string manipulation. It would be best if that bug was fixed before applying this patch.
# Issues
On the implementation side, it might be interesting to abstract the search interface away from the text widget, so that it could in principle be applied to any line-based textual source.
As in the single-line matching implementation in Tcl 8.x, the lack of support for backwards matching in Tcl's regexp library means that backwards matching can only be implemented as repeated forward matches, with a commensurate performance penalty \(the solution to which is outside the scope of this tip\).
Tk has a curious misfeature that _$text search -regexp "\\n" $pos_ will always fail to match anything. This behaviour will change as a result of this TIP \(it will match the first newline after _$pos_\), and any code which somehow depended on that peculiarity will therefore break.
# Copyright
This document has been placed in the public domain.