View Ticket
Ticket Hash: 6dd5588df68a5afdb421d44cb73b94326efffd3a
Title: tcltls spins forever calling sendto() on flush of a closed connecton
Status: Open Type: Code Defect
Severity: Severe Priority: Immediate
Subsystem: Resolution: Open
Last Modified: 2017-09-21 13:50:51
Version Found In: 1.7.12
User Comments:
anonymous added on 2017-09-20 12:47:49:
FreeBSD 11, tcl 8.6.7, tcltls 1.7.12, openssl 1.0.2k.

We have a process that handles a TLS socket and on EOF or error, it exits.

On exit, tcl tries to flush the socket. Sometimes, this will end up hanging forever consuming 100% CPU. truss says:

sendto(10,"\^W\^C\^A\0\240\^V\^E\M-Y\^[\^N"...,165,0,NULL,0) ERR#32 'Broken pipe'
sendto(10,"\^W\^C\^A\0\240\^V\^E\M-Y\^[\^N"...,165,0,NULL,0) ERR#32 'Broken pipe'
sendto(10,"\^W\^C\^A\0\240\^V\^E\M-Y\^[\^N"...,165,0,NULL,0) ERR#32 'Broken pipe'
sendto(10,"\^W\^C\^A\0\240\^V\^E\M-Y\^[\^N"...,165,0,NULL,0) ERR#32 'Broken pipe'

forever; FD 10 is the underlying TCP socket.

gdb says it is hanging here:

(gdb) bt
#0  0x00000008011d4dfa in _sendto () from /lib/libc.so.7
#1  0x0000000800e76a6f in pthread_suspend_all_np () from /lib/libthr.so.3
#2  0x0000000800bf2f71 in TcpOutputProc (instanceData=0x8024fd810, buf=0x802452483 "\027\003\001", toWrite=165, errorCodePtr=0x7fffffffdb7c) at tclUnixSock.c:560
#3  0x0000000800b63b63 in ChanWrite (chanPtr=0x80c094890, src=0x802452483 "\027\003\001", srcLen=165, errnoPtr=0x7fffffffdb7c) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclIO.c:517
#4  0x0000000800b63af7 in Tcl_WriteRaw (chan=0x80c094890, src=0x802452483 "\027\003\001", srcLen=165) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclIO.c:4039
#5  0x0000000802cb56ff in BIO_new_tcl () from /usr/local/lib/tls1.7.12/tcltls.so
#6  0x00000008033b56f8 in BIO_write () from /lib/libcrypto.so.8
#7  0x0000000802ef6c61 in ssl3_write_pending () from /usr/lib/libssl.so.8
#8  0x0000000802ef6652 in ssl3_write_bytes () from /usr/lib/libssl.so.8
#9  0x0000000802edbcf0 in BIO_ssl_shutdown () from /usr/lib/libssl.so.8
#10 0x00000008033b56f8 in BIO_write () from /lib/libcrypto.so.8
#11 0x0000000802cb5e1b in Tls_ChannelType () from /usr/local/lib/tls1.7.12/tcltls.so
#12 0x0000000800b63b63 in ChanWrite (chanPtr=0x80c097f90, 
    src=0x80c38f9f8 "alt\t3943\tclock\t1505832747\tewvel\t153.8\tfpm\t157\thexid\tA0EC3D\tlat\t39.9306\tlon\t-86.1884\tnsvel\t198.0\torigin\t1172\ttype\tmlat_result\n", srcLen=125, 
    errnoPtr=0x7fffffffde38) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclIO.c:517
#13 0x0000000800b62c67 in FlushChannel (interp=0x0, chanPtr=0x80c097f90, calledFromAsyncFlush=0) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclIO.c:2747
#14 0x0000000800b5f7b4 in Tcl_Close (interp=0x0, chan=0x80c097f90) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclIO.c:3464
#15 0x0000000800b5e646 in TclFinalizeIOSubsystem () at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclIO.c:657
#16 0x0000000800b2dd7b in FinalizeThread (quick=1) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclEvent.c:1305
#17 0x0000000800b2dae2 in Tcl_Exit (status=5) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclEvent.c:987
#18 0x0000000800a62b9d in Tcl_ExitObjCmd (dummy=0x0, interp=0x801c37950, objc=2, objv=0x801c4a868) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclCmdAH.c:1114
#19 0x0000000800a536aa in Dispatch (data=0x80c319758, interp=0x801c37950, result=0) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclBasic.c:4358
#20 0x0000000800a4e7f6 in TclNRRunCallbacks (interp=0x801c37950, result=0, rootPtr=0x80c01a540) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclBasic.c:4391
#21 0x0000000800a50db0 in TclEvalObjEx (interp=0x801c37950, objPtr=0x80c08a0b0, flags=131072, invoker=0x0, word=0) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclBasic.c:5957
#22 0x0000000800a4df1a in Tcl_EvalObjEx (interp=0x801c37950, objPtr=0x80c08a0b0, flags=131072) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclBasic.c:5938
#23 0x0000000800b68f40 in TclChannelEventScriptInvoker (clientData=0x80c219cd0, mask=2) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclIO.c:8866
#24 0x0000000800b68af9 in Tcl_NotifyChannel (channel=0x80c097f90, mask=2) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclIO.c:8360
#25 0x0000000800bf2e47 in WrapNotify (clientData=0x8024fd810, mask=2) at tclUnixSock.c:982
#26 0x0000000800bf7730 in FileHandlerEventProc (evPtr=0x80c2e4a50, flags=-3) at tclUnixNotfy.c:809
#27 0x0000000800b92d6c in Tcl_ServiceEvent (flags=-3) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclNotify.c:670
#28 0x0000000800b9326f in Tcl_DoOneEvent (flags=-3) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclNotify.c:967
#29 0x0000000800b2df6b in Tcl_VwaitObjCmd (clientData=0x0, interp=0x801c37950, objc=2, objv=0x801c4a6b8) at /usr/ports/lang/tcl86/work/tcl8.6.7/generic/tclEvent.c:1413

Some digging with gdb seems to indicate that it is spinning forever in FlushChannel(). TlsOutputProc ("Tls_ChannelType" in the stacktrace above,
because static function) calls BIO_write() which eventually generates the sendto(). BIO_write() returns with SSL_ERROR_ZERO_RETURN. TlsOutputProc()
returns 0. FlushChannel() thinks there is more work to do and loops.

rkeene added on 2017-09-21 13:50:51: (text/x-fossil-wiki)
Thanks for the detailed bug report.  It should be fixed soon.  In the meantime using the configure option "--enable-ssl-fastpath" should also workaround the issue  since it removes the intermediate code and ties the file descriptor of the socket directly to the SSL channel.