Tcl Improvement Proposals: Artifact [42de2b424e]

Artifact 42de2b424ea15334331aa5f10aba6553490ef658e07f20da439473a4acd156bc:

File tip/287.tip — part of check-in [15974cf55f] at 2006-12-01 16:16:04 on branch trunk — Implementation committed for 8.5b1 (user: dgp size: 5582)
TIP:            287
Title:          Add a Commands for Determining Size of Buffered Data
Version:        $Revision: 1.12 $
Author:         Michael A. Cleverly <[email protected]>
State:          Final
Type:           Project
Vote:		Done
Created:        26-Oct-2006
Post-History:   
Keywords:       Tcl,channel,chan,pendinginput,pendingoutput
Tcl-Version:    8.5

~ Abstract

Many network servers programmed in Tcl (including the venerable tclhttpd) are
vulnerable to DoS (denial of service) attacks because they lack any way to
introspect the amount of buffered data on a non-blocking socket that is in
line buffering mode. This TIP proposes a new subcommand to '''chan''' to
allow the amount of buffered input and buffered output (for symmetry) to be
inspected from Tcl.

~ Rationale

Many network protocols are inherently line-oriented (HTTP, SMTP, etc.) and the
natural approach to implementing servers for these protocols in Tcl is to
configure the incoming client sockets to use non-blocking I/O and to have
''line'' buffering and then define a readable fileevent callback.

|    proc accept {sock addr port} {
|        fconfigure $sock -buffering line -blocking 0
|        fileevent $sock readable [list callback $sock ...]
|    }
|    socket -server accept $listenPort

Recall that a readable fileevent will be called even when there is an
incomplete line buffered. As the '''fileevent''' manual page states:

 > A channel is considered to be readable if there is unread data available on
   the underlying device. A channel is also considered to be readable if there
   is unread data in an input buffer, except in the special case where the
   most recent attempt to read from the channel was a gets call that could not
   find a complete line in the input buffer.

The '''fblocked''' (and in 8.5 '''chan blocked''') command provides the Tcl
programmer a means to test whether:

 > the most recent input operation ...  returned less information than
   requested because all available input was exhausted.

There is currently no way at the Tcl level to see how much data is buffered
and could be read safely (via '''read''' instead of '''gets''').

There is also no way to specify any kind of upper limit on the length of a
line; when in line-buffering mode all input is buffered until an end-of-line
sequence is encountered or the EOF on the channel is reached.

The practical result is that all network daemons written in Tcl using
line-oriented I/O ('''gets''') can be fed repeated input lacking an
end-of-line sequence until all physical memory is exhausted.

This vulnerability has been recognized since at least 2001. See, for example,
the discussion between George Peter Staplin and Donald Porter on the ''gets''
page on the Tcl'ers Wiki [http://wiki.tcl.tk/gets].

~ Proposed Change

At the C level Tcl already has a function, ''Tcl_InputBuffered'' which returns
the number of unread bytes buffered for a channel and a corresponding ''Tcl_OutputBuffered'' which returns the number of bytes buffered for output that have not yet been flushed out.

This TIP proposes to implement a new ''chan pending'' command which
will take two arguments: a ''mode'' and a ''channelId''.  The mode argument can be either ''input'' or ''output''.

When the mode is ''input'' the command returns the value of ''Tcl_InputBuffered()'' (if the channel was open for input or -1 otherwise).

When the mode is ''output'' the command returns the value of ''Tcl_OutputBuffered()'' (if the channel was output for output or -1 otherwise).

This allows a programmer developing network daemons at the Tcl level to
implement their own policy decisions based on the size of the unread line.
Potential DoS situations could be avoided (in an application specific manner)
long before all memory was exhausted.

| if {[chan blocked $sock] && [chan pending input $sock] > $limit} {
|     # Take application specific steps (i.e., [close $sock] or
|     # [read $sock] to process a partial line and drain the buffer, etc.)
| }

~ Rejected Alternatives

 * Adding a flag to '''fblocked''' to return the number of unread bytes
   instead of just 0 or 1 (since '''fblocked''' is now considered deprecated
   as per [208]).

 * Polluting the global namespace with a new '''favailable''', '''fpending'''
   or '''fqueued''' command.

 * A '''chan unread''' because of potential confusion as to whether it
   performed ''ungetch()'' type functionality (''un-reed'' vs ''un-red'').

 * Any sort of '''-maxchars''' or '''-maxbytes''' flag to '''gets''' in order
   to not complicate the semantics of '''gets'''.  Additionally without even
   further complicating '''gets''' semantics one could not distinguish input
   of exactly $limit characters from the case where only $limit characters
   were returned (with some input remaining unread).

 * The initial version of this TIP called for a ''chan available'' command.
   This was changed to ''pendinginput'' (and ''pendingoutput'' added for
   symmetry's sake) following suggestions on news:comp.lang.tcl from Donald
   Arseneau and Donal Fellows, and later to ''chan pending'' that takes a ''mode'' argument (''input'' or ''output'') based on suggestions from Donald Porter and Joe English.

~ Reference Implementation

[[RFE 1586860]] at SourceForge now contains a patch implementing '''chan
pendinginput''' and '''chan pendingoutput''' (including updated '''chan''' man
page and corresponding test cases)
[http://sourceforge.net/tracker/index.php?func=detail&aid=1586860&group_id=10894&atid=360894].

~ Copyright

This document is in the public domain.