Ticket UUID: 953854
Title: Errors when parsing HTML to a tree
Type: Patch Version: None
Submitter: nobody Created on: 2004-05-14 10:11:51
Subsystem: htmlparse Assigned To: andreas_kupries
Priority: 1 Zero Severity:
Status: Closed Last Modified: 2006-01-18 13:15:37
Resolution: Accepted Closed By: andreas_kupries
    Closed on: 2006-01-18 06:15:37

In func

proc ::htmlparse::mapEscapes

line: return [subst $new]
should change to:
   return [subst -nobackslashes -novariables $new]

If not, if new has a backslash \, the subs breaks the 
string (specially noted in paths on Windows)

In func


    if {
       $sibling == {} ||
       (![string compare $tp [$tree get $sibling type]])
                 } {
Should change to:
       if { $sibling == "" } { break }
       if { [lsearch "h1 h2 h3 h4 h5 h6 p li" [$tree get
                 $sibling type]] != -1 } {

  Second option is less agressive when reordering tags.


  Ramon Ribó
  [email protected]
User Comments: andreas_kupries added on 2006-01-18 13:15:36:
Logged In: YES 

Mostly accepted. The changes to mapEscapes are outdated,
this was fixed in a different way, by an additional quoting
step protecting Tcl's special characters.

Reordering advice taken.

Examples are in the testsuite, actually. The relevant
testcases have been updated.

andreas_kupries added on 2004-09-30 04:46:10:
Logged In: YES 

Do you have small examples which demonstrate the bad
behaviour ? They would also become test cases.

nobody added on 2004-05-14 17:11:52:

