Ticket UUID: | 640932 | |||
Title: | htmlparse not parsing incomplete tags | |||
Type: | Bug | Version: | None | |
Submitter: | scottg | Created on: | 2002-11-19 21:36:51 | |
Subsystem: | None | Assigned To: | andreas_kupries | |
Priority: | 6 | Severity: | ||
Status: | Closed | Last Modified: | 2003-03-25 12:18:49 | |
Resolution: | Fixed | Closed By: | andreas_kupries | |
Closed on: | 2003-03-25 05:18:49 | |||
Description: |
htmlparse is supposed to be able to parse incomplete tags by stuffing the incomplete part into a global var and then, apparently, fishing it back and prepending it to it's next string input. This works fine only if the rest of the incomplete tag is in the same string, but not if the incomplete tag is in another string entirely, e.g. if separate lines are being processed in a foreach loop. I'm using tcllib 1.3. Here's a script that shows the bad behavior: #!/bin/tclsh package require Tcl 8.4 package require struct 1.2.1 package require cmdline 1.2 package require htmlparse 0.3 lappend lines {<root>} lappend lines {<tag>Hi there</tag>} lappend lines {<tag} lappend lines {>Hi there</tag>} lappend lines {</root>} proc tag_handler { tag slash attr text } { if {[string equal $tag hmstart]} { return } puts "TAG=$tag SLASH=$slash" } foreach localline $lines { ::htmlparse::parse -cmd tag_handler -incvar partial_xml $localline } Here's the output: axis-goodwins> ./bad-htmlparse.tcl TAG=root SLASH= TAG=tag SLASH= TAG=tag SLASH=/ TAG=tag SLASH=/ TAG=root SLASH=/ You can see that the third 'tag' should be an opening tag but is being marked as a closing tag. The opening tag never gets processed, and no errors are thrown. If I use htmlparse to parse this string: set line { <root> <tag >Hi there</tag> </root> } it works fine. /s. | |||
User Comments: |
andreas_kupries added on 2003-03-25 12:18:49:
Logged In: YES user_id=75003 Analyzing the problem ... The detection of incomplete data at the end of a string is ok. However a stored incomplete tag is never used in the next iteration. That is bad. Fixed in head. scottg added on 2002-11-20 04:36:52: File Added - 35746: bad-htmlparse.tcl |
Attachments:
- bad-htmlparse.tcl [download] added by scottg on 2002-11-20 04:36:52. [details]