Tk Library Source Code

View Ticket
Login
Ticket UUID: 953854
Title: Errors when parsing HTML to a tree
Type: Patch Version: None
Submitter: nobody Created on: 2004-05-14 10:11:51
Subsystem: htmlparse Assigned To: andreas_kupries
Priority: 1 Zero Severity:
Status: Closed Last Modified: 2006-01-18 13:15:37
Resolution: Accepted Closed By: andreas_kupries
    Closed on: 2006-01-18 06:15:37
Description:
   Hello,

In func

proc ::htmlparse::mapEscapes

line: return [subst $new]
should change to:
   return [subst -nobackslashes -novariables $new]

If not, if new has a backslash \, the subs breaks the 
string (specially noted in paths on Windows)

-------
In func

::htmlparse::Reorder

Lines:
    if {
       $sibling == {} ||
       (![string compare $tp [$tree get $sibling type]])
                 } {
                     break
                 }
Should change to:
       if { $sibling == "" } { break }
       if { [lsearch "h1 h2 h3 h4 h5 h6 p li" [$tree get
                 $sibling type]] != -1 } {
    break
     }

  Second option is less agressive when reordering tags.

  Regards,

  Ramon Ribó
  [email protected]
User Comments: andreas_kupries added on 2006-01-18 13:15:36:
Logged In: YES 
user_id=75003

Mostly accepted. The changes to mapEscapes are outdated,
this was fixed in a different way, by an additional quoting
step protecting Tcl's special characters.

Reordering advice taken.

Examples are in the testsuite, actually. The relevant
testcases have been updated.

andreas_kupries added on 2004-09-30 04:46:10:
Logged In: YES 
user_id=75003

Do you have small examples which demonstrate the bad
behaviour ? They would also become test cases.

nobody added on 2004-05-14 17:11:52:

File Added - 87140: htmlparse.tcl

Attachments: