Tk Library Source Code

View Ticket
Login
Ticket UUID: 640932
Title: htmlparse not parsing incomplete tags
Type: Bug Version: None
Submitter: scottg Created on: 2002-11-19 21:36:51
Subsystem: None Assigned To: andreas_kupries
Priority: 6 Severity:
Status: Closed Last Modified: 2003-03-25 12:18:49
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2003-03-25 05:18:49
Description:
htmlparse is supposed to be able to parse incomplete 
tags by stuffing the incomplete part into a global var and 
then, apparently, fishing it back and prepending it to it's 
next string input. This works fine only if the rest of the 
incomplete tag is in the same string, but not if the 
incomplete tag is in another string entirely, e.g. if 
separate lines are being processed in a foreach loop. 
I'm using tcllib 1.3.

Here's a script that shows the bad behavior:

#!/bin/tclsh

package require Tcl 8.4
package require struct 1.2.1
package require cmdline 1.2
package require htmlparse 0.3

lappend lines {<root>}
lappend lines {<tag>Hi there</tag>}  
lappend lines {<tag}
lappend lines {>Hi there</tag>}
lappend lines {</root>}

proc tag_handler { tag slash attr text } {
        if {[string equal $tag hmstart]} {
                return
        }
        puts "TAG=$tag SLASH=$slash"
}

foreach localline $lines {
        ::htmlparse::parse -cmd tag_handler -incvar 
partial_xml $localline
}

Here's the output:

axis-goodwins> ./bad-htmlparse.tcl 
TAG=root SLASH=
TAG=tag SLASH=
TAG=tag SLASH=/
TAG=tag SLASH=/
TAG=root SLASH=/

You can see that the third 'tag' should be an opening 
tag but is being marked as a closing tag. The opening 
tag never gets processed, and no errors are thrown.

If I use htmlparse to parse this string:

set line {
<root>
<tag
>Hi there</tag>
</root>
}

it works fine.

/s.
User Comments: andreas_kupries added on 2003-03-25 12:18:49:
Logged In: YES 
user_id=75003

Analyzing the problem ...
The detection of incomplete data at the end of a string is
ok.

However a stored incomplete tag is never used in the next
iteration. That is bad.

Fixed in head.

scottg added on 2002-11-20 04:36:52:

File Added - 35746: bad-htmlparse.tcl

Attachments: