Ticket UUID: | 336441ed59c9f49fb2dc5414911f5c90c7acdec3 | |||
Title: | socket -async stall on windows | |||
Type: | Bug | Version: | 8.5.15 | |
Submitter: | oehhar | Created on: | 2014-03-11 10:27:03 | |
Subsystem: | 24. Channel Commands | Assigned To: | oehhar | |
Priority: | 5 Medium | Severity: | Critical | |
Status: | Closed | Last Modified: | 2014-05-30 10:42:35 | |
Resolution: | Fixed | Closed By: | oehhar | |
Closed on: | 2014-05-30 10:42:35 | |||
Description: |
Within branch win-sock-async-connect-race-fix, in checkin [521b7229c4], Andreas Kupries reports about socket-async stalls due to not delivered FD_CONNECT event by the operating system. Read the checkin comment for the issue. Issues:
Conclusion:
Action:
This ticket is created to discuss the issue. Reinhard Max and Harald Oehlmann are working on the 8.6.x socket code in the branch bug-13d3af3ad5. There, the same issue was observed. The conclusion was:
Within the branch bug-13d3af3ad5, those issues are fixed. The workaround to use FD_WRITE instead FD_CONNECT was removed in bug-13d3af3ad5. The issue of this is, that an eventual connect failure is not detected. It should be reintroduced, if the information "FD_CONNECT's are not delivered" is correct. This work is in accordance what is done on the Unix side. | |||
User Comments: |
oehhar added on 2014-05-30 10:42:35:
Here is the message by Andreas about the test result I have even not hoped about but it happend: I have now completed the check and it seems to have done the trick. Using a stackato client wrapped with the basekit build from the specified revision. I ran my "testcase" (iterated 'stackato info' against a https target) and saw no hangs for 13 minutes at about 50 iterations/minute, i.e. circa 650 iterations. When the problem was active I could expect a hang within a minute and two at most. Thank you very much for the work on this. Having this in the Tcl 8.5 core branch will make me happy and willing to switch to it again, away from my "win-sock-async-connect-race-fix" branch (which I can/will then close). So, merged by commit [6ecb583012], bug closed, thank you all, Harald oehhar added on 2014-05-29 14:58:41: Another test version in bug branch: commit [a658836882]:
Andreas, I would appreciate, if you could test this. Thank you, Harald oehhar added on 2014-04-29 20:00:01: Andreas has tested the patch on 8.5 and it failed. Here is his message: Running our stackato.exe in a loop, simply asking for information from the target, with https (TLS) active the application hangs after about 14-28 iterations, with about 14 iterations per minute, so within 1 to 2 minutes. Symptom of not accepting ^C is the same as before I should note. After activating the --debug-http-log it is not hanging itself within 10 minutes anymore. As that option only activates more output, i.e. introduces delays this looks as if there is still a race condition present, old or new. This means that I will still have to use my fix and branch of Tcl 8.5 for the stackato client, instead of head. Sorry about the bad news. The rough outline of operations done in the client is: -1- register tls for https, with http -2- open a https -async socket to a webserver -3- read some data data, via readable fileevent -4- close the socket -5- format and print data Note that the iterations I speak of here are always new stackato processes, with each doing the above. The iteration does NOT happen in a single stackato process. The last time I had to investigate the hang happend inside of TLS, during the open of the socket, i.e. step 2. The TLS transform does sync read/writes to perform the TLS handshake, without using fileevents. I suspect that this is true this time as well. oehhar added on 2014-04-02 10:11:18: Test added, which, at least, works on my machine. As test is timing dependent, it may not show the error on other machines. Commited to core-8-5-branch by commit [1dfe1390d8]. Bug closed. oehhar added on 2014-04-01 13:47:39: Reinhard has created a test where this bug shows-up on my machine: set sock [socket -async 169.254.0.0 42424] after 10000 {set x timeout} fileevent $sock writable {set x writable} vwait x close $sock puts $x The bug shows up as socket 169.254.0.0 42424 fails so quickly on my machine with "network is unreachable". For me, the writable event does never fire and the timeout fires. This is on Windows Vista 32 bit with tcl8.5.15. oehhar added on 2014-03-22 16:18:53: Proposed solution in checkin [2596fec7bd] in branch "bug-336441ed59" ready for check. Backported fix from commit [65b320b464] from branch "bug-[13d3af3ad5]". |