Tcl Library Source Code

Ticket Change Details
Login
Overview

Artifact ID: c958563433e968019cc3590c82402fe0d15f0ea5ce6b0d20f9fc1f56a5bd3857
Ticket: 1f900bdf6bb6369987598e8c5f36a1e0f62b9798
improve ncgi's decode procedure
User & Date: aku 2019-06-13 05:39:27
Changes

  1. icomment:
    Pulled the relevant commits from the closed PR in.
    
    
    use -nocase switch in regsub commands to make regex shorter 
    
    ---- modules/ncgi/ncgi.tcl
    @@ -271,11 +271,11 @@ proc ::ncgi::decode {str} {
    
        set str [string map [list + { } "\\" "\\\\" \[ \\\[ \] \\\]] $str]
    
        # prepare to process all %-escapes
    -   regsub -all -- {%([Ee][A-Fa-f0-9])%([89ABab][A-Fa-f0-9])%([89ABab][A-Fa-f0-9])} \
    +   regsub -all -nocase -- {%([E][A-F0-9])%([89AB][A-F0-9])%([89AB][A-F0-9])} \
    	$str {[encoding convertfrom utf-8 [DecodeHex \1\2\3]]} str
    -   regsub -all -- {%([CDcd][A-Fa-f0-9])%([89ABab][A-Fa-f0-9])}                     \
    +   regsub -all -nocase -- {%([CD][A-F0-9])%([89AB][A-F0-9])}                     \
    	$str {[encoding convertfrom utf-8 [DecodeHex \1\2]]} str
    -   regsub -all -- {%([0-7][A-Fa-f0-9])} $str {\\u00\1} str
    +   regsub -all -nocase -- {%([0-7][A-F0-9])} $str {\\u00\1} str
    
        # process \u unicode mapped chars
        return [subst -novar $str]
    ----
    improved handling of one byte encodings
    
    In practice www-url-encoded POST params can use encodings other than UTF-8
    (think of legacy Tcl applications that use one of the ISO-8859-x charsets). In
    this case URL parameters can contain references to 8-bit code points (in the
    form of `%[A-F0-9][A-F0-9]`) that are not valid UTF-8 code points.
    
    For example, `%DC` can be used as a percent encoding for the german umlaut `Ü`
    (if a Tcl application is based on ISO-8859-1). Currently, the `decode`
    procedure does not decode `%DC` as all one byte UTF-8 code points must start
    with `[0-7]`.
    
    This commit improves the handling of one byte percent encoded non-ASCII
    characters. It allows to use ncgi in application contexts that do not use
    UTF-8 as the base encoding.
    
    ---- modules/ncgi/ncgi.tcl
    @@ -275,7 +275,7 @@ proc ::ncgi::decode {str} {
    
    	$str {[encoding convertfrom utf-8 [DecodeHex \1\2\3]]} str
        regsub -all -nocase -- {%([CD][A-F0-9])%([89AB][A-F0-9])}                     \
    	$str {[encoding convertfrom utf-8 [DecodeHex \1\2]]} str
    -   regsub -all -nocase -- {%([0-7][A-F0-9])} $str {\\u00\1} str
    +   regsub -all -nocase -- {%([A-F0-9][A-F0-9])} $str {\\u00\1} str
    
        # process \u unicode mapped chars
        return [subst -novar $str]
    ----
    
  2. login: "aku"
  3. mimetype: "text/plain"