Tcl Source Code

View Ticket
Login
Bounty program for improvements to Tcl and certain Tcl packages.
2014-12-06
11:19
Potential fix for [c6ed4acfd8]. Simple typo in original fix for [336441ed59]. Was looping on stat... check-in: 0658e05c4a user: ashok tags: trunk
2014-05-30
10:42 Closed ticket [336441ed59]: socket -async stall on windows plus 7 other changes artifact: 7bcd66b050 user: oehhar
10:36
win socket -async: do not loose connect notification by temporarily stop connect monitoring. Bug [33... check-in: 6ecb583012 user: oehhar tags: core-8-5-branch
2014-05-29
14:58 Ticket [336441ed59] socket -async stall on windows status still Open with 4 other changes artifact: 0674c51a75 user: oehhar
2014-05-20
15:06 Closed ticket [13d3af3ad5]: IPV6 only used for IPV4/IPV6 sockets on windows plus 7 other changes artifact: 201dc14fca user: oehhar
2014-04-29
20:00 Open ticket [336441ed59]: socket -async stall on windows plus 6 other changes artifact: 9e5678f685 user: oehhar
2014-04-04
11:53
Add tests for bugs [336441ed59] and [581937ab1e] from core-8-5-branch. check-in: b602826a44 user: max tags: bug-13d3af3ad5
2014-04-02
10:11 Closed ticket [336441ed59]: socket -async stall on windows plus 8 other changes artifact: 8d771ff15d user: oehhar
10:02
Fix bug [336441ed59]: Win socket stall on quick termination of async socket connect check-in: 1dfe1390d8 user: oehhar tags: core-8-5-branch
09:54
Test to demonstrate bug [336441ed59]. Depends on timing and will not always fire but is better than ... check-in: 22a6175c07 user: oehhar tags: bug-336441ed59
2014-04-01
13:47 Ticket [336441ed59] socket -async stall on windows status still Open with 4 other changes artifact: ec8d18e903 user: oehhar
2014-03-22
16:18 Ticket [336441ed59]: 4 changes artifact: 41acffd092 user: oehhar
16:14
Bug [336441ed59]: Buffer infoPtr between socket creation and insertion into info structure in thread... check-in: 2596fec7bd user: oehhar tags: bug-336441ed59
2014-03-11
10:27 New ticket [336441ed59] socket -async stall on windows. artifact: 3510c5a213 user: oehhar
2014-03-08
00:21
socket -async and gets/puts stall on windows (Ticket [336441ed59])

This is a change for a problem... Closed-Leaf check-in: 521b7229c4 user: andreask tags: win-sock-async-connect-race-fix


Ticket UUID: 336441ed59c9f49fb2dc5414911f5c90c7acdec3
Title: socket -async stall on windows
Type: Bug Version: 8.5.15
Submitter: oehhar Created on: 2014-03-11 10:27:03
Subsystem: 24. Channel Commands Assigned To: oehhar
Priority: 5 Medium Severity: Critical
Status: Closed Last Modified: 2014-05-30 10:42:35
Resolution: Fixed Closed By: oehhar
    Closed on: 2014-05-30 10:42:35
Description:

Within branch win-sock-async-connect-race-fix, in checkin [521b7229c4], Andreas Kupries reports about socket-async stalls due to not delivered FD_CONNECT event by the operating system. Read the checkin comment for the issue.

Issues:

  • gets or puts may never return on a socket with async connection
  • the connection notification FD_CONNECT is not observed

Conclusion:

  • FD_CONNECT is not sent by the OS

Action:

  • Use FD_WRITE as a fallback. There is already such code in the notifier proc. The proposed fix uses that to exit gets/puts.

This ticket is created to discuss the issue.

Reinhard Max and Harald Oehlmann are working on the 8.6.x socket code in the branch bug-13d3af3ad5.

There, the same issue was observed. The conclusion was:

  • FD_CONNECT is not delivered, if the socet connect fails between the connect() call and the WSAAsyncSelect() call
  • FD_CONNECT is ignored if delivered between the WSAAsyncSelect() call and the insertion of the socket structure in TcpThreadActionProc()

Within the branch bug-13d3af3ad5, those issues are fixed.

The workaround to use FD_WRITE instead FD_CONNECT was removed in bug-13d3af3ad5. The issue of this is, that an eventual connect failure is not detected. It should be reintroduced, if the information "FD_CONNECT's are not delivered" is correct.

This work is in accordance what is done on the Unix side.

User Comments: oehhar added on 2014-05-30 10:42:35:

Here is the message by Andreas about the test result I have even not hoped about but it happend:

I have now completed the check and it seems to have done the trick.
Using a stackato client wrapped with the basekit build from the specified revision.
I ran my "testcase" (iterated 'stackato info' against a https target) and saw no hangs for 13 minutes at about 50 iterations/minute, i.e. circa 650 iterations.

When the problem was active I could expect a hang within a minute and two at most.

Thank you very much for the work on this.

Having this in the Tcl 8.5 core branch will make me happy and willing to switch to it again, away from my "win-sock-async-connect-race-fix" branch (which I can/will then close).

So, merged by commit [6ecb583012], bug closed, thank you all, Harald


oehhar added on 2014-05-29 14:58:41:

Another test version in bug branch: commit [a658836882]:

  • Don't switch monitoring off when waiting for FD_CONNECT to not loose it

Andreas, I would appreciate, if you could test this.

Thank you, Harald


oehhar added on 2014-04-29 20:00:01:
Andreas has tested the patch on 8.5 and it failed.
Here is his message:

Running our stackato.exe in a loop, simply asking for information from
the target, with https (TLS) active the application hangs after about
14-28 iterations, with about 14 iterations per minute, so within 1 to
2 minutes. Symptom of not accepting ^C is the same as before I should
note.

After activating the --debug-http-log it is not hanging itself within
10 minutes anymore.
As that option only activates more output, i.e. introduces delays this
looks as if there is still a race condition present, old or new.

This means that I will still have to use my fix and branch of Tcl 8.5
for the stackato client, instead of head.
Sorry about the bad news.

The rough outline of operations done in the client is:

-1- register tls for https, with http
-2- open a https -async socket to a webserver
-3- read some data data, via readable fileevent
-4- close the socket
-5- format and print data

Note that the iterations I speak of here are always new stackato
processes, with each doing the above. The iteration does NOT happen in
a single stackato process.

The last time I had to investigate the hang happend inside of TLS,
during the open of the socket, i.e. step 2. The TLS transform does
sync read/writes to perform the TLS handshake, without using
fileevents.

I suspect that this is true this time as well.

oehhar added on 2014-04-02 10:11:18:

Test added, which, at least, works on my machine. As test is timing dependent, it may not show the error on other machines.

Commited to core-8-5-branch by commit [1dfe1390d8].

Bug closed.


oehhar added on 2014-04-01 13:47:39:

Reinhard has created a test where this bug shows-up on my machine:

set sock [socket -async 169.254.0.0 42424]
after 10000 {set x timeout}
fileevent $sock writable {set x writable}
vwait x
close $sock
puts $x

The bug shows up as socket 169.254.0.0 42424 fails so quickly on my machine with "network is unreachable". For me, the writable event does never fire and the timeout fires.

This is on Windows Vista 32 bit with tcl8.5.15.


oehhar added on 2014-03-22 16:18:53:

Proposed solution in checkin [2596fec7bd] in branch "bug-336441ed59" ready for check.

Backported fix from commit [65b320b464] from branch "bug-[13d3af3ad5]".