Tk Library Source Code

View Ticket
Login
Ticket UUID: 861277
Title: htmlparse.tcl: backslashes in content
Type: Bug Version: None
Submitter: jenglish Created on: 2003-12-16 21:15:56
Subsystem: htmlparse Assigned To: jenglish
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2004-10-05 01:58:23
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2004-10-04 18:58:23
Description:
htmlparse::parse fails if backslashes appear in content:

htmlparse::parse "<p>\\</p>"
  ==> error "Missing close-brace"
User Comments: andreas_kupries added on 2004-10-05 01:58:23:
Logged In: YES 
user_id=75003

Ok. This has been fixed and committed to head.

andreas_kupries added on 2004-10-05 01:34:59:
Logged In: YES 
user_id=75003

I will add test cases as well and when I am done both bugs
will be closed.

andreas_kupries added on 2004-10-05 01:34:29:
Logged In: YES 
user_id=75003

Yes, that is what Joe proposed to me a few minutes ago as
well, on the tcler's chat. I am currently implementing that.

davygrvy added on 2004-10-05 01:31:36:
Logged In: YES 
user_id=7549

Should be using the numeric entities. &ob;, &cb;, and &bsl;
aren't documented as entities in HTML 4.01

(Desktop) 7 % htmlparse::mapEscapes &#[scan \{ %c]\;
{
(Desktop) 8 % htmlparse::mapEscapes &#[scan } %c]\;
}
(Desktop) 9 % htmlparse::mapEscapes &#[scan \\ %c]\;
\

jenglish added on 2004-10-04 23:24:36:
Logged In: YES 
user_id=68433

Reopening --  this is back again.

htmlparse::PrepareHtml replaces "{", "}", and "\"  (left
brace, right brace, and backslash) with the nonstandard
entity references &ob;, &cb;, and &bsl;. 
htmlparse::mapEscapes used to change these back into braces
and backslashes, resp.  It looks like this was changed in
r1.16 (bug #1018574); now backslashes in content come out as
"&bsl;" (that's "ampersand, b, s, l, semicolon" in case the
bug tracker mangles it).

jenglish added on 2003-12-17 04:24:42:
Logged In: YES 
user_id=68433

Patch comitted.

jenglish added on 2003-12-17 04:20:51:

File Added - 70792: htmlparse-backslash.patch

jenglish added on 2003-12-17 04:20:30:
Logged In: YES 
user_id=68433

Looks like this was introduced in r1.9:
    regsub -all -- \\\\ $html {\&bsl;} html
changed to 
    return [string map [list [...] "\\\\" "&bsl;"] $html]

(i.e., retained one too many levels of \-escaping.)

Attached patch fixes the problem, and adds a test case.

Attachments: