Ticket UUID: | 861277 | |||
Title: | htmlparse.tcl: backslashes in content | |||
Type: | Bug | Version: | None | |
Submitter: | jenglish | Created on: | 2003-12-16 21:15:56 | |
Subsystem: | htmlparse | Assigned To: | jenglish | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2004-10-05 01:58:23 | |
Resolution: | Fixed | Closed By: | andreas_kupries | |
Closed on: | 2004-10-04 18:58:23 | |||
Description: |
htmlparse::parse fails if backslashes appear in content: htmlparse::parse "<p>\\</p>" ==> error "Missing close-brace" | |||
User Comments: |
andreas_kupries added on 2004-10-05 01:58:23:
Logged In: YES user_id=75003 Ok. This has been fixed and committed to head. andreas_kupries added on 2004-10-05 01:34:59: Logged In: YES user_id=75003 I will add test cases as well and when I am done both bugs will be closed. andreas_kupries added on 2004-10-05 01:34:29: Logged In: YES user_id=75003 Yes, that is what Joe proposed to me a few minutes ago as well, on the tcler's chat. I am currently implementing that. davygrvy added on 2004-10-05 01:31:36: Logged In: YES user_id=7549 Should be using the numeric entities. &ob;, &cb;, and &bsl; aren't documented as entities in HTML 4.01 (Desktop) 7 % htmlparse::mapEscapes &#[scan \{ %c]\; { (Desktop) 8 % htmlparse::mapEscapes &#[scan } %c]\; } (Desktop) 9 % htmlparse::mapEscapes &#[scan \\ %c]\; \ jenglish added on 2004-10-04 23:24:36: Logged In: YES user_id=68433 Reopening -- this is back again. htmlparse::PrepareHtml replaces "{", "}", and "\" (left brace, right brace, and backslash) with the nonstandard entity references &ob;, &cb;, and &bsl;. htmlparse::mapEscapes used to change these back into braces and backslashes, resp. It looks like this was changed in r1.16 (bug #1018574); now backslashes in content come out as "&bsl;" (that's "ampersand, b, s, l, semicolon" in case the bug tracker mangles it). jenglish added on 2003-12-17 04:24:42: Logged In: YES user_id=68433 Patch comitted. jenglish added on 2003-12-17 04:20:51: File Added - 70792: htmlparse-backslash.patch jenglish added on 2003-12-17 04:20:30: Logged In: YES user_id=68433 Looks like this was introduced in r1.9: regsub -all -- \\\\ $html {\&bsl;} html changed to return [string map [list [...] "\\\\" "&bsl;"] $html] (i.e., retained one too many levels of \-escaping.) Attached patch fixes the problem, and adds a test case. |
Attachments:
- htmlparse-backslash.patch [download] added by jenglish on 2003-12-17 04:20:50. [details]