Tcl Source Code

View Ticket
Login
Ticket UUID: 97069ea11aca3783949c0379725eed90f5573c23
Title: close returns empty error message if failed non-blocking write flush pending
Type: Bug Version: 8.5.15
Submitter: oehhar Created on: 2014-03-25 08:50:53
Subsystem: 24. Channel Commands Assigned To: aku
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2014-04-04 11:30:12
Resolution: Fixed Closed By: oehhar
    Closed on: 2014-04-04 11:30:12
Description: (text/x-fossil-wiki)
<h1>Issue</h1>
The trailing close in the following sequence throws an error but no error message:
<verbatim>
% set sock [socket -async localhost 30001]
sock420
% fconfigure $sock -blocking 0
% puts $sock ok
% flush $sock
% fileevent $sock writable {set x 1}
% vwait x
% catch {close $sock} e d
1
% set e
% set d
-code 1 -level 0 -errorstack {INNER {invokeStk1 close sock420}} -errorcode NONE
-errorinfo {
    while executing
"close $sock"} -errorline 1
</verbatim>

There are two conditions:
   *   The socket connect must fail
   *   flush $sock is executed before the socket connect internally failed. For machines where this is to quick, one must use another host instead localhost which requires a long DNS query.
   *   There are also relations with multiple connect rounds within an IPV6 enabled system (TCL8.6.1 Linux).

This arises on the following platforms:
   *   TCL 8.5.15 on Windows Vista 32 bit 
   *   TCL 8.6.1 on Windows Vista 32 bit
   *   TCL 8.6.1 on Linux too (Reported bei Reinhard Max)

<h1>Possible reasons</h1>

The error is generated by a backround flush sheduled by the writable event.
The error code is stored in unreportedError.
Eventually, the error message is missing in unreportedMsg.

A debug trace on tclIO.c:

<h2>flush $sock</h2>
   *   tclIO.c:FlushChannel()
   *   calls tclWinSock.c:TclOutputProc("ok"), which returns error EWOULDBLOCK
   *   Sets flag BG_FLUSH_SCHEDULED
   *   Clears error

<h2>vwait x</h2>
   *   tclIO.c:FlushChannel(calledFromAsyncFlush=1)
   *   calls tclWinSock.c:TclOutputProc("ok"), which tries to write and returns error 126
   *   Sets statePtr->unreportedError = errorCode (126)
   *   statePtr->unreportedMsg = msg (NULL)
   *   Calls DiscardOutputQueued(statePtr)
   *   Flag BG_FLUSH_SCHEDULED is cleared (no data)

<h2>close $sock</h2>
   *   tclIO.c:Tcl_Close()
   *   > Tcl_Close calls FlushChannel(interp, chanPtr, 0)
   *   >> FlushChannel calls CloseChannel(interp, chanPtr, errorCode)
   *   >>> CloseChannel calls ChanClose(chanPtr, interp)
   *   >>>> ChanClose calls tclWinSock.c:TclCloseProc(), which succeeds
   *   >>> CloseChannel sets errorCode = statePtr->unreportedError; (126)
   *   >>> CloseChannel calls Tcl_SetChannelErrorInterp() with NULL statePtr->unreportedMsg
   *   >>> CloseChannel returns errorCode (126)
   *   >> FLushChannel returns errorCode (126)
   *   > Tcl_Close returns error: if ((flushcode != 0) || (result != 0)) { return TCL_ERROR; }
User Comments: oehhar added on 2014-04-04 11:30:12: (text/x-fossil-wiki)
Fix committed to
   *   [45d1b34648] core-8-5-branch
   *   [7aa743c0ee] trunk

Bug closed.

oehhar added on 2014-04-01 16:20:41: (text/x-fossil-wiki)
<h1> Correction</h1>

Last message was an issue in tcl-core-8-5-branch with an intentional segmentation fault waiting for the fix.

It is not related to this ticket and its branch.

oehhar added on 2014-04-01 15:46:55: (text/x-fossil-wiki)
New checkin [5277cd7d5d] solves the issue, but has a GPF in the subprocess spwned by test iogt-2.4.
The debugger tells the following:
   *   tclIO::WillRead(chanPtr) is called with a channel pointer where the type field is a null pointer. As chanPtr->typePtr->seekProc is accessed without test, the GPF fires. Other elements of the chanPtr structure:
<verbatim>
state->channelName: file1782e70
state->flags:2
state->refCount:1
state->closeCbPtr: NULL
state->outputState: NULL
state->curOutPtr: NULL
state->outQueueHead: NULL
state->outQueueTail: NULL
state->saveInBufPtr: NULL
state->inQueueHead: 0x01783070
state->inQueueTail: 0x01783070
state->chPtr: NULL
state->scriptRecordPtr: NULL
state->timer: NULL
state->csPtrR: NULL
state->csPtrW: NULL
state->topChanPtr: 0x01e784a0
state->bottomChanPtr: 0x01782cb0
state->nextCSPtr: 0x0175b0b8
state->managingThread: 0x00001320
typePtr: NULL
downChanPtr: 0x01782cb0
upChanPtr: NULL
inQueueHead: NULL
inQueueTail: NULL
</verbatim>
   *   This was called by ChanRead(chanPtr,dst,dstsize=2,errnoPtr(0x00269dd0), *=-858993460)
   *   This was called by GetInput(chanPtr)
   *   This was called by DoReadChars(chanPtr,objPtr,toRead=-1,appendFlag=0)
   *   This was called by Tcl_ReadChars(chanPtr,objPtr,toRead=-1,appendFlag=0)
   *   This was called by Tcl_ReadObjCmd(NULL,interp=0x01733788, 2, "read file1782e70")

Full stack trace:
<verbatim>
WillRead(Channel * 0x017830f0) line 3570 + 6 bytes
ChanRead(Channel * 0x017830f0, char * 0x01783090, int 2, int * 0x00269dd0) line 354 + 9 bytes
GetInput(Channel * 0x017830f0) line 6093 + 30 bytes
DoReadChars(Channel * 0x017830f0, Tcl_Obj * 0x01810bb8, int -1, int 0) line 5152 + 9 bytes
Tcl_ReadChars(Tcl_Channel_ * 0x01782cb0, Tcl_Obj * 0x01810bb8, int -1, int 0) line 5040 + 21 bytes
Tcl_ReadObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 2, Tcl_Obj * const * 0x0173b2c4) line 450 + 19 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 2, Tcl_Obj * const * 0x0173b2c4, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x017d79d8) line 2417 + 29 bytes
TclCompEvalObj(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180d510, const CmdFrame * 0x00000000, int 0) line 1537 + 13 bytes
TclEvalObjEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180d510, int 0, const CmdFrame * 0x00000000, int 0) line 5296 + 21 bytes
Tcl_UplevelObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 1, Tcl_Obj * const * 0x0173b28c) line 945 + 25 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x0173b284, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x01749da0) line 2417 + 29 bytes
TclObjInterpProcCore(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180fc88, int 1, void (Tcl_Interp *, Tcl_Obj *)* 0x100fbb8b MakeProcError(Tcl_Interp *, Tcl_Obj *)) line 1763 + 13 bytes
TclObjInterpProc(void * 0x017c4830, Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782fc0) line 1653 + 22 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782fc0, const char * 0x00000000, int 0, int 0) line 3727 + 27 bytes
Tcl_EvalObjv(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782fc0, int 0) line 3925 + 25 bytes
TclEvalObjEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x01810870, int 0, const CmdFrame * 0x00000000, int 0) line 5153 + 21 bytes
Tcl_UplevelObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 1, Tcl_Obj * const * 0x0173b1c0) line 945 + 25 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x0173b1b8, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x017fea48) line 2417 + 29 bytes
TclObjInterpProcCore(Tcl_Interp * 0x01733788, Tcl_Obj * 0x018103f0, int 1, void (Tcl_Interp *, Tcl_Obj *)* 0x100fbb8b MakeProcError(Tcl_Interp *, Tcl_Obj *)) line 1763 + 13 bytes
TclObjInterpProc(void * 0x017e4600, Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782e40) line 1653 + 22 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782e40, const char * 0x00000000, int 0, int 0) line 3727 + 27 bytes
Tcl_EvalObjv(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782e40, int 0) line 3925 + 25 bytes
TclEvalObjEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180fce8, int 0, const CmdFrame * 0x00000000, int 0) line 5153 + 21 bytes
Tcl_UplevelObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 1, Tcl_Obj * const * 0x0173b0ec) line 945 + 25 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x0173b0e4, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x01e6e1f8) line 2417 + 29 bytes
TclObjInterpProcCore(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180d1f8, int 1, void (Tcl_Interp *, Tcl_Obj *)* 0x100fbb8b MakeProcError(Tcl_Interp *, Tcl_Obj *)) line 1763 + 13 bytes
TclObjInterpProc(void * 0x017c44b0, Tcl_Interp * 0x01733788, int 6, Tcl_Obj * const * 0x0173ae28) line 1653 + 22 bytes
InvokeImportedCmd(void * 0x017bcda0, Tcl_Interp * 0x01733788, int 6, Tcl_Obj * const * 0x0173ae28) line 1919 + 27 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 6, Tcl_Obj * const * 0x0173ae28, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x017e56b8) line 2417 + 29 bytes
TclCompEvalObj(Tcl_Interp * 0x01733788, Tcl_Obj * 0x01739a68, const CmdFrame * 0x0173ac90, int 3) line 1537 + 13 bytes
TclEvalObjEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x01739a68, int 0, const CmdFrame * 0x0173ac90, int 3) line 5296 + 21 bytes
NamespaceEvalCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 4, Tcl_Obj * const * 0x0173acc0) line 3316 + 26 bytes
Tcl_NamespaceObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 4, Tcl_Obj * const * 0x0173acc0) line 2851 + 21 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 4, Tcl_Obj * const * 0x0173acc0, const char * 0x01772ff4, int 21445, int 0) line 3727 + 27 bytes
TclEvalEx(Tcl_Interp * 0x01733788, const char * 0x01772e08, int 21979, int 0, int 17, int * 0x00000000, const char * 0x01772e08) line 4427 + 33 bytes
Tcl_EvalEx(Tcl_Interp * 0x01733788, const char * 0x01772e08, int 21979, int 0) line 4083 + 29 bytes
Tcl_FSEvalFileEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x01738b38, const char * 0x00000000) line 1853 + 19 bytes
Tcl_Main(int -1, char * * 0x01731d88, int (Tcl_Interp *)* 0x00401014 _Tcl_AppInit) line 440 + 20 bytes
main(int 12, char * * 0x01731d58) line 101 + 21 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 767ad2e9()
NTDLL! 778a1603()
NTDLL! 778a15d6()
</verbatim>

In the current case, the subprocess returns "General Protection Violation".
If I delete the test for "Tcl_GetCharLength(res)", so the result is always replaced, I get also a GPF but a result of:
"not owner".

I am stuck here:
   *   what is the relation to the fix and the test failure ?
   *   Any ideas welcome ;-)

oehhar added on 2014-04-01 07:45:19: (text/x-fossil-wiki)
I tried today to populate the unreportedMsg field by the following patch.
<verbatim>
--- C:/test/tcl-bug-97069ea11a_no_error_message/generic/tclIO_ori.c	Tue Apr 01 08:29:28 2014
+++ C:/test/tcl-bug-97069ea11a_no_error_message/generic/tclIO.c	Tue Apr 01 09:03:48 2014
@@ -2531,10 +2531,15 @@
 
 		if (statePtr->unreportedError == 0) {
 		    statePtr->unreportedError = errorCode;
-		    statePtr->unreportedMsg = msg;
-		    if (msg != NULL) {
-			Tcl_IncrRefCount(msg);
+		    /*
+		     * Bug 97069ea11a: If unreportedError is set, always set
+		     * an error message to.
+		     */
+		    if (msg == NULL) {
+			msg = Tcl_NewStringObj(Tcl_ErrnoMsg(errorCode),-1);
 		    }
+		    Tcl_IncrRefCount(msg);
+		    statePtr->unreportedMsg = msg;
 		} else {
 		    /*
 		     * An old unreported error is kept, and this error thrown
</verbatim>
Unfortunately, this field attends a 4 elements list structure to report in:
   *   TclChanCaughtErrorBypass()
   *   > UnmarshallErrorResult(interp,msg="host is not connected")
and only the errorNo is set, not the message.
The following patch