Ticket UUID: | ea69b0258a9833cb61ada42d1fc742d90aec04d0 | |||
Title: | Crash when using a channel transformation on TCP client socket | |||
Type: | Bug | Version: | 8.6.13 | |
Submitter: | petrokaz | Created on: | 2023-01-12 15:04:29 | |
Subsystem: | 26. Channel Transforms | Assigned To: | pooryorick | |
Priority: | 5 Medium | Severity: | Critical | |
Status: | Closed | Last Modified: | 2023-03-24 21:39:56 | |
Resolution: | Fixed | Closed By: | pooryorick | |
Closed on: | 2023-03-24 21:39:56 | |||
Description: |
(text/x-markdown)
Tcl crashes when I setup a channel transformation on a non-blocking TCP client socket, and then remove it. Here is the minimal reproducible example. A dummy echo server: ``` ncat -l -k -e /bin/cat --crlf 0.0.0.0 4400 ``` And my Tcl script: ``` package require tcl::transform::observe proc writable {sock} { chan event $sock writable {} chan configure $sock -translation {crlf binary} -blocking 0 -buffering full -encoding binary chan event $sock readable [list readable $sock] set ::status "connected" } proc readable {sock} { set readCount [chan gets $sock line] if {$readCount <= 0} { if {[eof $sock]} { puts "Connection aborted" close $sock set :status "closed" break } # we don't have a full line yet - wait for next chan event return } set ::msg $line } set sock [socket -async localhost 4400] chan event $sock writable [list writable $sock] vwait ::status puts "Status $status" tcl::transform::observe $sock stdout "" puts $sock "Hello\r\n" flush $sock vwait ::msg puts "Got $msg" # update ;#- avoids crash chan pop $sock after 100 {set ::forever 1} vwait forever ;# crashes here close $sock puts DONE ``` This produces the following output: ``` Status connected Hello Got Hello Segmentation fault (core dumped) ``` The lines "hello\\r\\n" are printed by the channel observer. Then Tcl crashes after I remove the channel transformation and enter the event loop. The crash does NOT occur if I uncomment the line update ;#- avoids crash The crash does NOT occur if I don't use the channel transformation. I can reproduce the crash with 100% chance on OpenSUSE 15.4/Tcl 8.6.12 (from repos) and Windows/Tcl 8.6.13 (from Magicsplat). Thanks in advance. | |||
User Comments: |
pooryorick added on 2023-03-15 11:48:22:
(text/x-fossil-wiki)
The fix includes a test that, prior to the fix, reproduced the problem on a Linux system. I would expect the following script to reproduce the issue on Windows as well. Could someone try it out and report the results? <code><verbatim> #! /usr/bin/env tclsh namespace eval rchan { proc initialize {chan mode} { namespace eval $chan { variable source "line one\nline two" } after 0 [list ::chan postevent $chan read] return {initialize finalize watch read} } proc finalize chan { namespace delete $chan } proc read {chan count} { namespace upvar $chan counter counter source source set res [string range $source 0 $count-1] set source [string range $source $count end] return $res } proc watch {chan events} { after 0 [list chan postevent $chan read] return read } namespace ensemble create namespace export * } namespace eval filter { proc initialize {chan mode} { return {initialize finalize read} } proc read {chan buffer} { return $buffer } namespace ensemble create namespace export * } proc read1 {chan args} { variable read catch { set res [gets $chan] } incr read } set chan [chan create read rchan] chan configure $chan -blocking 0 chan push $chan filter chan event $chan read [list read1 $chan] vwait [namespace current]::read chan pop $chan vwait [namespace current]::read return </verbatim></code> pooryorick added on 2023-03-15 11:43:05: (text/x-fossil-wiki) Proper cleanup added for 8.6 in [a719a1392ca50359], for 8.7 in [294e0130dcff79a3], and for 9.0 in [a76dee9eb447f3f9]. pooryorick added on 2023-03-13 21:28:23: (text/x-fossil-wiki) chw, I missed your previous remark today. I see what you mean. Looking into it now. chw added on 2023-03-13 21:09:16: Nathan, at least in core-8-6-branch there are two locations where Tcl_DeleteTimerHandler() is called for the timer. Which has incremented the refCount when created. Which does not get decremented when the timer is deleted. Which might result in the refCount never reaching zero. Which might be kind of a leak. Do you agree? pooryorick added on 2023-03-13 21:02:19: (text/x-fossil-wiki) Fixed for 8.6 in [a8b8ecbc039fb4e0], for 8.7 in [6d017aacac9dd19d], and for 9.0 in [62058cafe065cb35]. chw added on 2023-03-13 16:55:52: Nathan's fix in [052f54ddfb] looks plausible but IMO isn't complete yet. When the creation of the ChannelTimerProc timer increments the refCount per TclChannelPreserve() and it's expiry does the opposite with TclChannelRelease(), all locations where the (unexpired) timer is deleted must perform the TclChannelRelease(), too. apnadkarni added on 2023-03-13 16:09:05: Harald, Petro could reproduce on Windows as well though I could not despite many different attempts with configurations. Given it looks like an early release of memory, it could all be timing dependent for the bug to manifest itself depending on when the memory gets reused for example. Just guessing... /Ashok oehhar added on 2023-03-13 12:45:17: (text/x-fossil-wiki) Pooryorik, thanks for the fix in [052f54ddfb]. Can you imagine, why this only fails on Linux ? It is in the gereric code... Pietro, can you check, if the fix is ok for you ? Take care, Harald apnadkarni added on 2023-03-07 15:23:37: Well, the good news is that I tried it on my Ubuntu WSL and it crashed right away. The bad news is I have no clue about Linux crash dumps or debuggers. So this will take some time. But at least it is reproducible on Linux with 8.6.13. /Ashok apnadkarni added on 2023-03-07 15:00:43: (text/x-markdown) I've tried it on Windows using magicsplat, as well as the latest repository under debug and release modes, in a batch loop. Have not been able to get it to crash. (Using the Windows ncat from the nmap site). Output is always ``` Status=connected Hello1 Hello2 Connection aborted Got closed pending input before pop: 0 pending input after pop: 0 DONE ``` Note pending input is always 0. I just cannot get it to be non-0 which may be why my test does not crash. When I get time, I'll experiment different configs to try and get non-0 input. BTW what version of tcllib are you using? I presume same as what shipped with the 8.6.13 magicsplat distribution? oehhar added on 2023-03-07 13:10:45: (text/x-fossil-wiki) Yes, I fear the reproduction recipe is "Linux only". stdout is used, what is not available with Windows wish. Maybe, some Linux foks will jump in... Take care, Harald petrokaz added on 2023-03-07 12:48:35: (text/x-markdown) Hello Harald, you need some 'echo' TCP server running on the port 4400. On Linux the easiest way is to run netcat as I show in the bug report. I'm not sure if netcat with the same features is available on Windows. You could write it in Tcl as well. oehhar added on 2023-03-07 12:42:53: (text/x-fossil-wiki) Thanks ! What I have done: * start wish 8.6.13 32 bit on windows * take tcllib 1.21 <verbatim> lappend auto_path {C:\test\tcllib1.21} </verbatim> * Paste your script * it waits some time in the line <verbatim> vwait ::msg </verbatim> * It shows the background error <verbatim> error reading "sock03F5A828": socket is not connected while executing "chan gets $sock line" (procedure "readable" line 2) invoked from within "readable sock03F5A828" </verbatim> Would you have additional instructions to reach the line where it should crash ? Thank you and take care, Harald petrokaz added on 2023-03-07 10:32:52: Hello Ashok, sorry for the poor bug report... I made some mistakes while trying to extract a minimal reproducible example from my original code that is much bigger. I've taken your script and changed it so that it crashes again - attached crash3.tcl So, AFAICS [chan pop] crashes if there is pending input. apnadkarni added on 2023-03-07 06:53:28: (text/x-markdown) Although Tcl should never crash from anything you do at the scripting level, there are several problems with your code which I presume you have fixed in your script but updated the ticket. In the `readcount` proc, - there is a `break` outside of a loop which causes the event handler to abort. I presume that should be a return. - the `$readcount <= 0` check should be `$readcount < 0` since getting back an empty line is perfectly kosher. In particular note your `puts Hello\r\n` is sending **two** newlines, one explicit and one because of the `\r\n`. So you will get back an empty line from the server. - The `set :status "closed"` line is missing a `:` - should be `::status` - Moreover, strictly speaking you should also set ::msg in there in case the server closes without sending anything back (the vwait is waiting on ::msg) - In case of EOF without receiving any lines, the handler closes the socket and the chan pop is attempted on a closed socket. Again this is only in the case the server closes without sending anything. In any case, I could not reproduce the crash. With your script, I simply get Tcl exceptions, as expected. With my changes (see script below), I get a print out ``` Status connected Hello Connection aborted Got closed DONE ``` This is irrespective of whether the update is commented or not. My script is below, please try and see if the crash is reproduced on your system with my script: ``` lappend auto_path d:/tcl/lib package require tcl::transform::observe proc writable {sock} { chan event $sock writable {} chan configure $sock -translation {crlf binary} -blocking 0 -buffering full -encoding binary chan event $sock readable [list readable $sock] set ::status "connected" } proc readable {sock} { set readCount [chan gets $sock line] if {$readCount < 0} { if {[eof $sock]} { puts "Connection aborted" # close $sock chan event $sock readable {} set ::msg "closed" return } # we don't have a full line yet - wait for next chan event return } set ::msg $line } set sock [socket -async localhost 4400] chan event $sock writable [list writable $sock] vwait ::status puts "Status $status" tcl::transform::observe $sock stdout "" puts $sock "Hello" flush $sock vwait ::msg puts "Got $msg" # update ;#- avoids crash chan pop $sock after 100 {set ::forever 1} vwait forever ;# crashes here close $sock puts DONE ``` petrokaz (claiming to be Petro Kazmirchuk) added on 2023-01-12 16:33:47: replace "sleep" with a plain "update" for even smaller example |
Attachments:
- crash3.tcl [download] added by petrokaz on 2023-03-07 10:31:17. [details]