TclTLS: Ticket Change Details

Overview

Artifact ID:	639511f397f343ef62f1e6ef897748f31bb73d48a854f515b3a960fd27801d38
Ticket:	88c0c8496999c48f513eb4f97aaa0ac9829b35d3 EOF handling potentially broken with OpenSSL 1.1.1e or newer
User & Date:	gustafn3 on 2023-10-22 11:55:59

Changes

foundin changed to: "tcltls-1.7.22"

icomment:

The EOF handling has changed in OpenSSL 1.1.1e, where it changed from SSL_ERROR_SYSCALL with errno 0 to SSL_ERROR_SSL with reason code SSL_R_UNEXPECTED_EOF_WHILE_READING [1]. This change in OpenSSL requires also adjustments in applications using OpenSSL (see, e.g., [2]), including tcltls.

We noticed the problem when upgrading a machine from CentOS 7 to Rocky
Linux 9, where after the upgrade a script like the following stopped
working:

````
$ /usr/local/ns/bin/tclsh8.6
% package require tls                 
% set f [tls::socket localhost 8443]
% puts $f "GET / HTTP/1.0\n"
% flush $f
% set content [read $f]
% close $f
````

The problem manifests itself in the "read" operation, where first, it
transfers the full content, and then it reports "software caused
connection abort". See below the output from the debug macros of
tcltls.

````
./tlsIO.c:385:TlsInputProc():BIO_read(4096)
...
./tlsIO.c:422:TlsInputProc():BIO_read -> 465
./tlsIO.c:425:TlsInputProc():BIO_read returned err 0
...
./tlsIO.c:422:TlsInputProc():BIO_read -> 207
./tlsIO.c:425:TlsInputProc():BIO_read returned err 0
...
./tlsBIO.c:262:BioCtrl():Got BIO_CTRL_EOF
./tlsBIO.c:127:BioWrite():[chan=0x1438a7990] BioWrite(24) -> 24 [tclEof=1; tclErrno=0]
./tlsBIO.c:148:BioWrite():Successfully wrote some data
...
./tls.c:180:InfoCallback():Called
./tlsIO.c:422:TlsInputProc():BIO_read -> 0
./tlsIO.c:425:TlsInputProc():BIO_read returned err 1
./tlsIO.c:460:TlsInputProc():SSL negotiation error, indicating that the connection has been aborted
./tls.c:367:Tls_Error():Called
./tlsIO.c:502:TlsInputProc():Input(4096) -> -1 [53]
./tlsIO.c:719:TlsWatchProc():TlsWatchProc(0x0)
./tlsIO.c:728:TlsWatchProc():statePtr->flags=0
./tlsIO.c:992:Tls_GetParent():Requested to get parent of channel 0x1438a0790
./tlsIO.c:754:TlsWatchProc():Registering our interest in the lower channel (chan=0x1438a7990)
error reading "sock144076990": software caused connection abort
````

The problem exists not only on Linux, but as well on macOS (13.6)
Below is a patch that fixes the problem without going into the (version
dependent) error code / error reason handling of OpenSSL, since this approach makes the issue more transparent. This patch below was tested with Tcl 8.6.13, tcltls-1.7.22 and OpenSSL 3.1.3 (19 Sep 2023).

````
$ diff -wu tlsIO.c-orig tlsIO.c
--- tlsIO.c-orig	2020-10-12 22:39:22
+++ tlsIO.c	2023-10-22 12:33:11
@@ -420,6 +420,18 @@
 	ERR_clear_error();
 	bytesRead = BIO_read(statePtr->bio, buf, bufSize);
 	dprintf("BIO_read -> %d", bytesRead);
+
+	if (bytesRead == 0 && Tcl_Eof(statePtr->self)) {
+            /* 
+             * We know through BIO_CTRL_EOF that we are already at
+             * EOF (determined during BIO_read()). There is no need to
+             * try to handle this situation via error and reason codes
+             * from OpenSSL.
+             */
+             dprintf("tried to read while channel is already at EOF");
+             *errorCodePtr = 0;
+             return(bytesRead);
````


[1] https://mta.openssl.org/pipermail/openssl-project/2020-May/001975.html   
[2] https://groups.google.com/g/mailing.openssl.users/c/9C2rT9WVqW8/m/1F-8JWnzAQAJ

login: "gustafn3"
mimetype: "text/x-markdown"
private_contact changed to: "ef1993a98c1daa778fe0b246a7af12b3076f2240"
severity changed to: "Critical"
status changed to: "Open"

title changed to:

EOF handling potentially broken with OpenSSL 1.1.1e or newer

type changed to: "Code Defect"