View Ticket
2024-05-28
00:30 Ticket [88c0c84969] EOF handling potentially broken with OpenSSL 1.1.1e or newer status still Open with 4 other changes artifact: 28d32ca3e2 user: bohagan
2024-01-01
12:32 Ticket [88c0c84969]: 3 changes artifact: 9bf50203b7 user: Avalanchee
2023-11-14
09:58 Ticket [88c0c84969]: 3 changes artifact: 2f9a729234 user: gustafn3
2023-11-10
21:15 Ticket [88c0c84969]: 5 changes artifact: 515aa59d81 user: azazel
2023-10-22
11:55 New ticket [88c0c84969]. artifact: 639511f397 user: gustafn3

Ticket Hash: 88c0c8496999c48f513eb4f97aaa0ac9829b35d3
Title: EOF handling potentially broken with OpenSSL 1.1.1e or newer
Status: Open Type: Code Defect
Severity: Critical Priority: Immediate
Subsystem: Resolution: Open
Last Modified: 2024-05-28 00:30:50
Version Found In: 1.8
User Comments:
gustafn3 added on 2023-10-22 11:55:59:

The EOF handling has changed in OpenSSL 1.1.1e, where it changed from SSL_ERROR_SYSCALL with errno 0 to SSL_ERROR_SSL with reason code SSL_R_UNEXPECTED_EOF_WHILE_READING [1]. This change in OpenSSL requires also adjustments in applications using OpenSSL (see, e.g., [2]), including tcltls.

We noticed the problem when upgrading a machine from CentOS 7 to Rocky Linux 9, where after the upgrade a script like the following stopped working:

$ /usr/local/ns/bin/tclsh8.6
% package require tls                 
% set f [tls::socket localhost 8443]
% puts $f "GET / HTTP/1.0\n"
% flush $f
% set content [read $f]
% close $f

The problem manifests itself in the "read" operation, where first, it transfers the full content, and then it reports "software caused connection abort". See below the output from the debug macros of tcltls.

./tlsIO.c:385:TlsInputProc():BIO_read(4096)
...
./tlsIO.c:422:TlsInputProc():BIO_read -> 465
./tlsIO.c:425:TlsInputProc():BIO_read returned err 0
...
./tlsIO.c:422:TlsInputProc():BIO_read -> 207
./tlsIO.c:425:TlsInputProc():BIO_read returned err 0
...
./tlsBIO.c:262:BioCtrl():Got BIO_CTRL_EOF
./tlsBIO.c:127:BioWrite():[chan=0x1438a7990] BioWrite(24) -> 24 [tclEof=1; tclErrno=0]
./tlsBIO.c:148:BioWrite():Successfully wrote some data
...
./tls.c:180:InfoCallback():Called
./tlsIO.c:422:TlsInputProc():BIO_read -> 0
./tlsIO.c:425:TlsInputProc():BIO_read returned err 1
./tlsIO.c:460:TlsInputProc():SSL negotiation error, indicating that the connection has been aborted
./tls.c:367:Tls_Error():Called
./tlsIO.c:502:TlsInputProc():Input(4096) -> -1 [53]
./tlsIO.c:719:TlsWatchProc():TlsWatchProc(0x0)
./tlsIO.c:728:TlsWatchProc():statePtr->flags=0
./tlsIO.c:992:Tls_GetParent():Requested to get parent of channel 0x1438a0790
./tlsIO.c:754:TlsWatchProc():Registering our interest in the lower channel (chan=0x1438a7990)
error reading "sock144076990": software caused connection abort

The problem exists not only on Linux, but as well on macOS (13.6) Below is a patch that fixes the problem without going into the (version dependent) error code / error reason handling of OpenSSL, since this approach makes the issue more transparent. This patch below was tested with Tcl 8.6.13, tcltls-1.7.22 and OpenSSL 3.1.3 (19 Sep 2023).

$ diff -wu tlsIO.c-orig tlsIO.c
--- tlsIO.c-orig	2020-10-12 22:39:22
+++ tlsIO.c	2023-10-22 12:33:11
@@ -420,6 +420,18 @@
 	ERR_clear_error();
 	bytesRead = BIO_read(statePtr->bio, buf, bufSize);
 	dprintf("BIO_read -> %d", bytesRead);
+
+	if (bytesRead == 0 && Tcl_Eof(statePtr->self)) {
+            /* 
+             * We know through BIO_CTRL_EOF that we are already at
+             * EOF (determined during BIO_read()). There is no need to
+             * try to handle this situation via error and reason codes
+             * from OpenSSL.
+             */
+             dprintf("tried to read while channel is already at EOF");
+             *errorCodePtr = 0;
+             return(bytesRead);

[1] https://mta.openssl.org/pipermail/openssl-project/2020-May/001975.html
[2] https://groups.google.com/g/mailing.openssl.users/c/9C2rT9WVqW8/m/1F-8JWnzAQAJ


azazel added on 2023-11-10 21:15:10:

There is an SSL option governing this behaviour: SSL_OP_IGNORE_UNEXPECTED_EOF. There are reasons why OpenSSL made this change. See:

  • https://github.com/openssl/openssl/issues/10880
  • https://github.com/openssl/openssl/pull/10882

My initial thought was unconditionally to set the option:

--- tls.c
+++ tls.c
@@ -1212,10 +1212,13 @@
 #endif

     SSL_CTX_set_app_data( ctx, (VOID*)interp); /* remember the interpreter */
     SSL_CTX_set_options( ctx, SSL_OP_ALL);     /* all SSL bug workarounds */
     SSL_CTX_set_options( ctx, off);    /* all SSL bug workarounds */
+#ifdef SSL_OP_IGNORE_UNEXPECTED_EOF
+    SSL_CTX_set_options( ctx, SSL_OP_IGNORE_UNEXPECTED_EOF);
+#endif
     SSL_CTX_sess_set_cache_size( ctx, 128);

     if (ciphers != NULL)
        SSL_CTX_set_cipher_list(ctx, ciphers);

However, given that this change in behaviour was intended to reveal errors that were previously unreported, I think it may make sense to add an option (this is what openssl did for s_client): e.g., -ignoreunexpectedeof bool. The next question is whether to default it to true or false. To preserve backwards compatibility and be liberal in what we accept, my inclination would be to default to true.


gustafn3 added on 2023-11-14 09:58:13:

I am not arguing about the changed behavior in OpenSSL about of the unexpected EOF handling (which hit many projects). No matter how the default should be, the old EOF handling of tcltls (relying on error states) is not ok and lead to a breaking behavior for us when upgrading OS versions. My suggested fix makes EOF handling clear, more robust, and avoids the error situation.

The decision to set SSL_OP_IGNORE_UNEXPECTED_EOF or not (and maybe making it configurable) is independent of this.


Avalanchee added on 2024-01-01 12:32:52:

Is this project dead? I mean I'm very thankful for the active participants here, so I was able to compile the patch into my deployment. However this is a breaking bug that should be handled with utmost urgency.

From looking at the timeline it doesn't seem like anyone is maintaining this project, am I wrong?

I would like to kindly ask to promote this fix into release, and also the "feature-dump-keys" branch if possible. Thanks.


bohagan added on 2024-05-28 00:30:50:
This should be fixed by the changes in [5a33efb87bd3750f]. I tested with a few
files that were reported to have the issue, and they worked. Please test with
the trunk version and let me know if you you have any URLs that still fail due
to the unexpected EOF. Thanks.