Tcl Library Source Code

View Ticket
Login
Ticket UUID: ced089d5fec86a1b4722ffbd93810820ccc06845
Title: Multiplexer test continues to fail on FreeBSD
Type: Bug Version: 1.17
Submitter: mi Created on: 2015-05-27 20:58:46
Subsystem: multiplexer Assigned To: aku
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2015-06-08 20:31:27
Resolution: Fixed Closed By: aku
    Closed on: 2015-06-08 20:31:27
Description: (text/html)
<p align="justify">The problem was originally reported as a <A href="https://sourceforge.net/p/tcllib/bugs/1212/">SourceForge Bug #1212</A>, and closed as unreproducible on Ubuntu. Well, it remains a problem on FreeBSD today, five years after the original report.

<p>Please, do the needful.
User Comments: aku added on 2015-06-08 20:31:27:
Merged to trunk [debee3c876].
Pushed.

aku added on 2015-06-04 06:42:07:
Patch applied, into branch "tkt-ced089d5fe-multiplexer".
Commit [9bfb503d18].
Pushed.

Mikhail, can you confirm that this is a fix for the testsuite in your FreeBSD environs ?

anonymous (claiming to be aspect) added on 2015-06-04 00:54:38: (text/html)
multiplexer-5.2 ensures that an access filter can deny (immediately close) inbound connections correctly, by checking that the first write from the client fails.

Adding a second write after 200ms seems to do the right thing:

<pre>
$ uname -r
10.1-RELEASE-p6
$ ~/bin/tclkit multiplexer.test
- tcllib::testutils 1.2
* logger 0.9.4
* multiplexer 0.2
multiplexer.test:       Total   9       Passed  9       Skipped 0       Failed  0
</pre>

Thanks for the suggestion - I wasn't thinking clearly about Nagle and thought a delay before the first write should be sufficient.

Patch inline below:

<pre>
Index: modules/multiplexer/multiplexer.test
==================================================================
--- modules/multiplexer/multiplexer.test
+++ modules/multiplexer/multiplexer.test
@@ -193,22 +193,26 @@
     set ::forever {}
     set mp [multiplexer::create]
     ${mp}::Init 37465
     ${mp}::AddAccessFilter DenyAccessFilter
     set sk1 [socket localhost 37465]
-    set sk2 [socket localhost 37465]
-    update
-    fconfigure $sk1 -buffering none
-    if { [catch {
-	puts $sk1 "boom"
-    } err] } {
-	set result "socket blocked"
-    } else {
-	set result "socket not blocked"
+    after idle {
+	update
+	fconfigure $sk1 -buffering none
+	if { [catch {
+	    puts $sk1 "boom"
+	    after 200	;# delay to overcome nagle - see ticket [ced089d5fe]
+	    puts $sk1 "tish"
+	} err] } {
+	    set ::forever "socket blocked"
+	} else {
+	    set ::forever "socket not blocked"
+	}
     }
+    vwait ::forever
     ${mp}::destroy
-    set result
+    set forever
 } {socket blocked}
 
 
 testsuiteCleanup
 return
</pre>

aku added on 2015-06-03 18:01:19:
  > multiplexer-5.2 such that it accurately tests what it claims to.

As a non-maintainer/non-author, what does multiplexer-5.2 claim to test ?

For the record, having read both example and reference now I agree with aspect that the test is apparently sensitive to OS differences in the TCP stack. I further agree with mi that using two puts more than 200 millis apart might be enough to overcome Nagle. aspect, could you test this for us ?

mi added on 2015-06-03 13:38:49: (text/html)
<blockquote>I don't know how best to alter multiplexer-5.2 such that it accurately tests what it claims to.</blockquote>

<p align="justify">How about writing a longer text and/or making two writes with an interval between them, that's longer than 0.2 second? Even if the first write succeeds because of Nagle's algorithm, the second one ought to fail...

<p align="justify">Of course, it would've been best, if Tcl allowed manipulating the socket's parameters (such as setting the <tt>TCP_NDELAY</tt>).

anonymous (claiming to be aspect) added on 2015-06-03 08:53:29:
I've investigated this a little, and come to the conclusion that the test failure is benign:  as [http://paste.tclers.tk/3523] illustrates and [http://www.unixguide.net/network/socketfaq/2.11.shtml] explains, the assumption that puts will fail on a blocking, unbuffered socket whose remote has closed is not valid.  It seems to be mostly true on Linux, and often false on FreeBSD.

I don't know how best to alter multiplexer-5.2 such that it accurately tests what it claims to.

mi added on 2015-05-27 21:22:08:
"Needful" is whatever is needed to fix the problem. The current stance, that "it is not a problem because it works on Ubuntu" seems unsustainable.

aku added on 2015-05-27 21:18:04:
Forgot to ask, what do you consider to be "the needful".
This is a term made to mean 3 different things to any 2 people.

aku added on 2015-05-27 21:16:27:
We do not seem to have a patch for this.
Could this be a core issue with (intra-process) sockets on FreeBSD ?
Or the Tcl core eventloop.
Looks to require more instrumentation in the testsuite to see what is going on on Linux, and then compare with FreeBSD.

aku added on 2015-05-27 21:10:11: (text/x-fossil-wiki)
The local ticket in question is [3053446fffffffffffff].