TclTLS: History Of Ticket 94c6a431fee47acd

Artifacts Associated With Ticket 94c6a431fee47acd

Ticket change [9218f65776] (rid 1345) by anonymous on 2018-04-01 14:06:53:

foundin initialized to: "1.7.16"

icomment:

This bug might not be in Tcltls: it might be in Tcl stacked channels, the Tcl
[read] command, fileevent generation or buffering.

The bug occurs when using Tcltls 1.7.16 with the stock Tcl 8.6.8 and
LibreSSL 2.6.4, but has existed for a long time: it also occurs in older
versions at least as far back as Tcltls 1.6.7, Tcl 8.5.13 and OpenSSL 1.02d.

When a channel is opened with tls::socket, a non-blocking [read] does not
always return the available data, sometimes (if -blocksize is 4096) even if it
exceeds the size requested; instead it waits until the remote server closes
the connection.

When using http::geturl with tls::socket, a HTTPS request for an entity that
is larger than the -blocksize hangs if the server response uses a persistent
connection and does not use chunked encoding (but does supply an accurate
Content-Length header).  The client has some successful [reads], but then
does nothing until the server times out, although the full response has been
received and is buffered somewhere.

The relevant part of http is in the command http::Event, for unchunked data
with known size $state(totalsize) obtained from a Content-Length header.  The
relevant part of the code can be abstracted as:

fconfigure $sock -blocking off -translation binary -buffersize $state(-blocksize)
fileevent $sock readable [list http::Event $sock $token]

proc http::Event {sock token} {
    variable $token
    upvar 0 $token state

    # 1. How many bytes to ask for.
    set reqSize $state(-blocksize)

    # 2. Read.
    set block [read $sock $reqSize]
    set n [string length $block]
    if {$n >= 0} {
        append state(body) $block
        incr state(currentsize) $n
    }

    # 3. Check for end of data.
    if {$state(currentsize) >= $state(totalsize)} {
        Eof $token
    }
    return
}

The bug can be worked around by precisely specifying the size of the
untransferred response body.

    # 1. How many bytes to ask for.
    set reqSize [expr {$state(totalsize) - $state(currentsize)}]

This workaround fails if reqSize is capped at $state(-blocksize):

    # 1. How many bytes to ask for.
    set reqSize [expr {min($reqSize, $state(-blocksize))}]

When the hanging occurs, more than $state(-blocksize) bytes can be available,
(if -blocksize is 4096) but they are not returned by [read] until timeout.

The properties read by [fconfigure $sock] (when added to http::Event) are:
  -blocking 0
  -buffering full
  -buffersize 8192
  -encoding binary
  -eofchar {{} {}}
  -translation {lf crlf}
The last three properties correspond (for reading) to the results of the command
  fconfigure $sock -translation binary

read(n) describes non-blocking mode with two arguments:

      read channelId numChars

  If  channelId is in nonblocking mode, the command may not read as many
  characters as requested: once all available input has been read, the command
  will return the data that is available rather than blocking for more input.
  If the channel is configured to use a multi-byte encoding, then there may
  actually be some bytes remaining in the internal buffers  that do not form
  a  complete character.  These bytes will not be returned until a complete
  character is available or end-of-file is reached.  The -nonewline switch is
  ignored if the command returns before reaching the end of the file.

The hanging occurs even with a binary resource, fetched with -encoding binary,
so the multi-byte issue does not arise.  There is no problem with http, only
with https where [tls::socket] is used in place of [socket]. read(n) makes
clear that in nonblocking mode, the command will not wait to receive the number
of characters requested.

Example code that demonstrates the delay:

# The http package sets both the -buffersize for the socket, and the number of
# bytes requested with [read], to the http::geturl option -blocksize, which has
# default value 8192.

package require http

set start [clock milliseconds]

proc http::Log {args} {
    set time [expr {[clock milliseconds] - $::start}]
    puts stderr [list $time {*}$args]
    flush stderr
    return
}

package require tls
http::register https 443 [list ::tls::socket -ssl2 0 -ssl3 0 -tls1 1 -tls1.1 1 -tls1.2 1]

proc GetFromWeb {blocksize url} {
    set ::prevSize 0
    set token    [::http::geturl $url     \
                   -blocksize $blocksize  \
                   -keepalive 1           \
                   -progress httpProgress \
                   -headers {accept-encoding unsupported} \
    ]
    array get $token
    # N.B. Implicit Return
}

proc httpProgress {token total current} {
    upvar 0 $token state
    set msg {, and therefore will not demonstrate the bug.}

    if {    [info exists state(transfer)]
         && ($state(transfer) eq "chunked")
    } {
	puts "The response uses chunked transfer encoding, $msg"
    } elseif {$total == 0} {
	puts "The server has not reported the size of the entity, $msg"
    }
    if {$state(connection) eq {close}} {
	puts "The server will close the connection, $msg"
    }
    set time [expr {[clock milliseconds] - $::start}]
    set extra [expr {$current - $::prevSize}]
    set ::prevSize $current
    puts "$time Fetched $extra bytes - accumulated $current bytes of total $total"
    flush stdout
}


set blocksize 8192

# This URL serves (at least in the UK) a resource of about 300kB, unchunked, and
# times out after approx 13s.

set url https://www.bbc.co.uk/

# These URLs demonstrate that http sites (using [socket]) do not
# trigger the bug.

# set url http://news.mit.edu/sites/mit.edu.newsoffice/files/styles/news_article_image_top_slideshow/public/images/2018/MIT-Debris-Disks_0.jpg?itok=kGim01Xz
# set url http://static01.nyt.com/images/2018/03/30/theater/30threetallwomen2/merlin_135911256_e2911f33-8b35-445d-90ee-d15457c19fae-superJumbo.jpg

# The corresponding https URLs do trigger the bug, but are inconvenient to use
# because the server does not time out for about 10 minutes.

# set url https://news.mit.edu/sites/mit.edu.newsoffice/files/styles/news_article_image_top_slideshow/public/images/2018/MIT-Debris-Disks_0.jpg?itok=kGim01Xz
# set url https://static01.nyt.com/images/2018/03/30/theater/30threetallwomen2/merlin_135911256_e2911f33-8b35-445d-90ee-d15457c19fae-superJumbo.jpg

set start [clock milliseconds]
set img [GetFromWeb $blocksize $url]
set NUL {}

login: "anonymous"
mimetype: "text/plain"
private_contact initialized to: "a774b1f5653dc3360e29ea7d635a64e94c9f1dab"
severity initialized to: "Important"
status initialized to: "Open"
title initialized to: "Buffering until timeout"
type initialized to: "Incident"

Ticket change [36ac572937] (rid 1346) by anonymous on 2018-04-01 14:17:03:
1. icomment:
```
The workaround has been applied to Tcl's http library in branch bug-46b6edad51-concurrent-http
```
2. login: "anonymous"
3. mimetype: "text/x-fossil-plain"
4. priority changed to: "Immediate"
5. resolution changed to: "Open"

Ticket change [7b64ec9322] (rid 1347) by anonymous on 2018-04-01 19:53:58:

icomment:

This bug shows many similarities with the bug reported here:

https://sourceforge.net/p/tls/bugs/38/

and (cross-filed) here:

http://core.tcl.tk/tcl/tktview/1945538fffffffffffff

Regards,
Erik Leunissen
--

login: "anonymous"
mimetype: "text/x-fossil-plain"

Ticket change [861701fd96] (rid 1349) by anonymous on 2018-04-02 02:21:24:
1. icomment:
```
Thanks, Erik.  Yes, it does seem to be the same bug.
```
2. login: "anonymous"
3. mimetype: "text/plain"

Ticket change [5eac592725] (rid 1358) by anonymous on 2018-10-22 21:01:25:

icomment:

Hi,

We have the same problem and I was able to pinpoint it after learning tcl channel frame.

1. In "tlsWatchProc" of tlsIO.c, the condition to create timer is not covering all cases.  It should also check the BIO buffer.
The line should read:
if ((mask & TCL_READABLE) &&
    (Tcl_InputBuffered(statePtr->self) > 0 || BIO_ctrl_pending(statePtr->bio) > 0)) {
...............
}

2.  In "BioCtrl" of tlsBIO.c, the result of BIO_CTRL_PENDING should be:
ret = ((chan) ? Tcl_InputBuffered(chan) : 0);
NOT ret = ((chan) ? 1: 0);

Second error caused BIO_ctrl_pending always return 1 even no data available.

Please let me know after you push our the patch.

Best,

Jinhu
[email protected]

login: "anonymous"
mimetype: "text/plain"
status changed to: "Verified"
type changed to: "Code Defect"
username: "[email protected]"

Ticket change [4d8c983355] (rid 1392) by rkeene on 2019-04-09 18:02:25:
1. icomment: "Fixed in TclTLS v1.7.17+"
2. login: "rkeene"
3. mimetype: "text/x-fossil-plain"
4. priority changed to: "High"
5. resolution changed to: "Fixed"
6. status changed to: "Closed"