Tcl Source Code

View Ticket
Login
Ticket UUID: a13b9d0ce1083eb302a9ad1ce88e7351c928fdd6
Title: The http package uses wrong mapping for content-encoding
Type: Bug Version: 2.9.1
Submitter: sbron Created on: 2020-07-24 16:13:23
Subsystem: 29. http Package Assigned To: nobody
Priority: 5 Medium Severity: Minor
Status: Pending Last Modified: 2022-09-11 11:02:50
Resolution: Fixed Closed By: nobody
    Closed on:
Description:

When a web site returns a resource with "Content-Encoding: deflate", the data should be compressed according to RFC 1951 and then wrapped according to RFC 1950. The proper way to decode such a resource in Tcl is using the zlib decompress method, not zlib inflate. The zlib inflate method expects raw compressed data. Apparently this was also implemented incorrectly in IE, resulting in some web sites, which were only tested with IE, to deliver the wrong data format. To deal with this situation, most other browsers were changed to accept both. Either Tcl should also accept both, or the correct data format. Not only the misguided Microsoft interpretation.

A Content-Encoding value of "compress" specifies the data was encoded the way the unix "compress" command does. The zlib library does not support this format. As it is an outdated format anyway, I think the http package should simply not advertise its support in the Accept-Encoding header.

Unfortunately I have been unsuccessful in finding a web site that returns deflated data. Probably due to the Microsoft blunder, the deflate method is considered tainted and sites don't use it much anymore. I have created a php script on my own web site that will use either gzip or deflate encoding depending on the received Accept-Encoding header (using the PHP function ob_gzhandler, which presumably produces the correct output). Forcing the Accept-Encoding header to "deflate" causes http::geturl to throw a "data error":

package require http 2.9.1
set url http://www.tclcode.com/bugs/deflate.php
set tok [http::geturl $url -headers {Accept-Encoding deflate}]
puts [http::data $tok]
http::cleanup $tok

User Comments: kjnash added on 2022-09-11 11:02:50:
The existing http allows "Content-Encoding: gzip,deflate,compress".
Of these options, gzip is by far the most commonly used.

Commit [d482da7d8c] implements the suggestions made in the bug report (above):
(a) it removes option "compress" which appears to use the wrong algorithm;
(b) it allows option "deflate" to use either the RFC-compliant format, or the Microsoft-inspired non-compliant "raw" format (except that a -channel uses only the compliant format).

Re "compress":

RFC 9110 Sec 8.4.1.1 states

   The "compress" coding is an adaptive Lempel-Ziv-Welch (LZW) coding
   [Welch] that is commonly produced by the UNIX file compression
   program "compress".

The output of "zlib compress" appears not to be in "UNIX compress" format.  The command-line tools compress/decompress and the Tcl commands "zlib compress"/"zlib decompress" cannot interoperate.  It appears therefore that the Tcl implementation is incorrect.  I have removed the "compress" option from:
(1) the default value of "Accept-Encoding"
(2) the options implemented in http::ContentEncoding
(3) http(n)

Tests have been modified accordingly.

In view of the unpopularity of "compress" on Web servers, it does not seem worthwhile to attempt to fix the format in http.