Tcl Source Code

View Ticket
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Ticket UUID: 97069ea11aca3783949c0379725eed90f5573c23
Title: close returns empty error message if failed non-blocking write flush pending
Type: Bug Version: 8.5.15
Submitter: oehhar Created on: 2014-03-25 08:50:53
Subsystem: 24. Channel Commands Assigned To: aku
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2014-04-04 11:30:12
Resolution: Fixed Closed By: oehhar
    Closed on: 2014-04-04 11:30:12
Description:

Issue

The trailing close in the following sequence throws an error but no error message:
% set sock [socket -async localhost 30001]
sock420
% fconfigure $sock -blocking 0
% puts $sock ok
% flush $sock
% fileevent $sock writable {set x 1}
% vwait x
% catch {close $sock} e d
1
% set e
% set d
-code 1 -level 0 -errorstack {INNER {invokeStk1 close sock420}} -errorcode NONE
-errorinfo {
    while executing
"close $sock"} -errorline 1

There are two conditions:

  • The socket connect must fail
  • flush $sock is executed before the socket connect internally failed. For machines where this is to quick, one must use another host instead localhost which requires a long DNS query.
  • There are also relations with multiple connect rounds within an IPV6 enabled system (TCL8.6.1 Linux).

This arises on the following platforms:

  • TCL 8.5.15 on Windows Vista 32 bit
  • TCL 8.6.1 on Windows Vista 32 bit
  • TCL 8.6.1 on Linux too (Reported bei Reinhard Max)

Possible reasons

The error is generated by a backround flush sheduled by the writable event. The error code is stored in unreportedError. Eventually, the error message is missing in unreportedMsg.

A debug trace on tclIO.c:

flush $sock

  • tclIO.c:FlushChannel()
  • calls tclWinSock.c:TclOutputProc("ok"), which returns error EWOULDBLOCK
  • Sets flag BG_FLUSH_SCHEDULED
  • Clears error

vwait x

  • tclIO.c:FlushChannel(calledFromAsyncFlush=1)
  • calls tclWinSock.c:TclOutputProc("ok"), which tries to write and returns error 126
  • Sets statePtr->unreportedError = errorCode (126)
  • statePtr->unreportedMsg = msg (NULL)
  • Calls DiscardOutputQueued(statePtr)
  • Flag BG_FLUSH_SCHEDULED is cleared (no data)

close $sock

  • tclIO.c:Tcl_Close()
  • > Tcl_Close calls FlushChannel(interp, chanPtr, 0)
  • >> FlushChannel calls CloseChannel(interp, chanPtr, errorCode)
  • >>> CloseChannel calls ChanClose(chanPtr, interp)
  • >>>> ChanClose calls tclWinSock.c:TclCloseProc(), which succeeds
  • >>> CloseChannel sets errorCode = statePtr->unreportedError; (126)
  • >>> CloseChannel calls Tcl_SetChannelErrorInterp() with NULL statePtr->unreportedMsg
  • >>> CloseChannel returns errorCode (126)
  • >> FLushChannel returns errorCode (126)
  • > Tcl_Close returns error: if ((flushcode != 0) || (result != 0)) { return TCL_ERROR; }

User Comments: oehhar added on 2014-04-04 11:30:12:

Fix committed to

  • [45d1b34648] core-8-5-branch
  • [7aa743c0ee] trunk

Bug closed.


oehhar added on 2014-04-01 16:20:41:

Correction

Last message was an issue in tcl-core-8-5-branch with an intentional segmentation fault waiting for the fix.

It is not related to this ticket and its branch.


oehhar added on 2014-04-01 15:46:55:

New checkin [5277cd7d5d] solves the issue, but has a GPF in the subprocess spwned by test iogt-2.4. The debugger tells the following:

  • tclIO::WillRead(chanPtr) is called with a channel pointer where the type field is a null pointer. As chanPtr->typePtr->seekProc is accessed without test, the GPF fires. Other elements of the chanPtr structure:
    state->channelName: file1782e70
    state->flags:2
    state->refCount:1
    state->closeCbPtr: NULL
    state->outputState: NULL
    state->curOutPtr: NULL
    state->outQueueHead: NULL
    state->outQueueTail: NULL
    state->saveInBufPtr: NULL
    state->inQueueHead: 0x01783070
    state->inQueueTail: 0x01783070
    state->chPtr: NULL
    state->scriptRecordPtr: NULL
    state->timer: NULL
    state->csPtrR: NULL
    state->csPtrW: NULL
    state->topChanPtr: 0x01e784a0
    state->bottomChanPtr: 0x01782cb0
    state->nextCSPtr: 0x0175b0b8
    state->managingThread: 0x00001320
    typePtr: NULL
    downChanPtr: 0x01782cb0
    upChanPtr: NULL
    inQueueHead: NULL
    inQueueTail: NULL
    
  • This was called by ChanRead(chanPtr,dst,dstsize=2,errnoPtr(0x00269dd0), *=-858993460)
  • This was called by GetInput(chanPtr)
  • This was called by DoReadChars(chanPtr,objPtr,toRead=-1,appendFlag=0)
  • This was called by Tcl_ReadChars(chanPtr,objPtr,toRead=-1,appendFlag=0)
  • This was called by Tcl_ReadObjCmd(NULL,interp=0x01733788, 2, "read file1782e70")

Full stack trace:

WillRead(Channel * 0x017830f0) line 3570 + 6 bytes
ChanRead(Channel * 0x017830f0, char * 0x01783090, int 2, int * 0x00269dd0) line 354 + 9 bytes
GetInput(Channel * 0x017830f0) line 6093 + 30 bytes
DoReadChars(Channel * 0x017830f0, Tcl_Obj * 0x01810bb8, int -1, int 0) line 5152 + 9 bytes
Tcl_ReadChars(Tcl_Channel_ * 0x01782cb0, Tcl_Obj * 0x01810bb8, int -1, int 0) line 5040 + 21 bytes
Tcl_ReadObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 2, Tcl_Obj * const * 0x0173b2c4) line 450 + 19 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 2, Tcl_Obj * const * 0x0173b2c4, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x017d79d8) line 2417 + 29 bytes
TclCompEvalObj(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180d510, const CmdFrame * 0x00000000, int 0) line 1537 + 13 bytes
TclEvalObjEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180d510, int 0, const CmdFrame * 0x00000000, int 0) line 5296 + 21 bytes
Tcl_UplevelObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 1, Tcl_Obj * const * 0x0173b28c) line 945 + 25 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x0173b284, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x01749da0) line 2417 + 29 bytes
TclObjInterpProcCore(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180fc88, int 1, void (Tcl_Interp *, Tcl_Obj *)* 0x100fbb8b MakeProcError(Tcl_Interp *, Tcl_Obj *)) line 1763 + 13 bytes
TclObjInterpProc(void * 0x017c4830, Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782fc0) line 1653 + 22 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782fc0, const char * 0x00000000, int 0, int 0) line 3727 + 27 bytes
Tcl_EvalObjv(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782fc0, int 0) line 3925 + 25 bytes
TclEvalObjEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x01810870, int 0, const CmdFrame * 0x00000000, int 0) line 5153 + 21 bytes
Tcl_UplevelObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 1, Tcl_Obj * const * 0x0173b1c0) line 945 + 25 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x0173b1b8, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x017fea48) line 2417 + 29 bytes
TclObjInterpProcCore(Tcl_Interp * 0x01733788, Tcl_Obj * 0x018103f0, int 1, void (Tcl_Interp *, Tcl_Obj *)* 0x100fbb8b MakeProcError(Tcl_Interp *, Tcl_Obj *)) line 1763 + 13 bytes
TclObjInterpProc(void * 0x017e4600, Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782e40) line 1653 + 22 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782e40, const char * 0x00000000, int 0, int 0) line 3727 + 27 bytes
Tcl_EvalObjv(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x01782e40, int 0) line 3925 + 25 bytes
TclEvalObjEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180fce8, int 0, const CmdFrame * 0x00000000, int 0) line 5153 + 21 bytes
Tcl_UplevelObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 1, Tcl_Obj * const * 0x0173b0ec) line 945 + 25 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 3, Tcl_Obj * const * 0x0173b0e4, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x01e6e1f8) line 2417 + 29 bytes
TclObjInterpProcCore(Tcl_Interp * 0x01733788, Tcl_Obj * 0x0180d1f8, int 1, void (Tcl_Interp *, Tcl_Obj *)* 0x100fbb8b MakeProcError(Tcl_Interp *, Tcl_Obj *)) line 1763 + 13 bytes
TclObjInterpProc(void * 0x017c44b0, Tcl_Interp * 0x01733788, int 6, Tcl_Obj * const * 0x0173ae28) line 1653 + 22 bytes
InvokeImportedCmd(void * 0x017bcda0, Tcl_Interp * 0x01733788, int 6, Tcl_Obj * const * 0x0173ae28) line 1919 + 27 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 6, Tcl_Obj * const * 0x0173ae28, const char * 0xffffffff, int -1, int 0) line 3727 + 27 bytes
TclExecuteByteCode(Tcl_Interp * 0x01733788, ByteCode * 0x017e56b8) line 2417 + 29 bytes
TclCompEvalObj(Tcl_Interp * 0x01733788, Tcl_Obj * 0x01739a68, const CmdFrame * 0x0173ac90, int 3) line 1537 + 13 bytes
TclEvalObjEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x01739a68, int 0, const CmdFrame * 0x0173ac90, int 3) line 5296 + 21 bytes
NamespaceEvalCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 4, Tcl_Obj * const * 0x0173acc0) line 3316 + 26 bytes
Tcl_NamespaceObjCmd(void * 0x00000000, Tcl_Interp * 0x01733788, int 4, Tcl_Obj * const * 0x0173acc0) line 2851 + 21 bytes
TclEvalObjvInternal(Tcl_Interp * 0x01733788, int 4, Tcl_Obj * const * 0x0173acc0, const char * 0x01772ff4, int 21445, int 0) line 3727 + 27 bytes
TclEvalEx(Tcl_Interp * 0x01733788, const char * 0x01772e08, int 21979, int 0, int 17, int * 0x00000000, const char * 0x01772e08) line 4427 + 33 bytes
Tcl_EvalEx(Tcl_Interp * 0x01733788, const char * 0x01772e08, int 21979, int 0) line 4083 + 29 bytes
Tcl_FSEvalFileEx(Tcl_Interp * 0x01733788, Tcl_Obj * 0x01738b38, const char * 0x00000000) line 1853 + 19 bytes
Tcl_Main(int -1, char * * 0x01731d88, int (Tcl_Interp *)* 0x00401014 _Tcl_AppInit) line 440 + 20 bytes
main(int 12, char * * 0x01731d58) line 101 + 21 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 767ad2e9()
NTDLL! 778a1603()
NTDLL! 778a15d6()

In the current case, the subprocess returns "General Protection Violation". If I delete the test for "Tcl_GetCharLength(res)", so the result is always replaced, I get also a GPF but a result of: "not owner".

I am stuck here:

  • what is the relation to the fix and the test failure ?
  • Any ideas welcome ;-)


oehhar added on 2014-04-01 07:45:19:

I tried today to populate the unreportedMsg field by the following patch.

--- C:/test/tcl-bug-97069ea11a_no_error_message/generic/tclIO_ori.c	Tue Apr 01 08:29:28 2014
+++ C:/test/tcl-bug-97069ea11a_no_error_message/generic/tclIO.c	Tue Apr 01 09:03:48 2014
@@ -2531,10 +2531,15 @@
 
 		if (statePtr->unreportedError == 0) {
 		    statePtr->unreportedError = errorCode;
-		    statePtr->unreportedMsg = msg;
-		    if (msg != NULL) {
-			Tcl_IncrRefCount(msg);
+		    /*
+		     * Bug 97069ea11a: If unreportedError is set, always set
+		     * an error message to.
+		     */
+		    if (msg == NULL) {
+			msg = Tcl_NewStringObj(Tcl_ErrnoMsg(errorCode),-1);
 		    }
+		    Tcl_IncrRefCount(msg);
+		    statePtr->unreportedMsg = msg;
 		} else {
 		    /*
 		     * An old unreported error is kept, and this error thrown
Unfortunately, this field attends a 4 elements list structure to report in:
  • TclChanCaughtErrorBypass()
  • > UnmarshallErrorResult(interp,msg="host is not connected") and only the errorNo is set, not the message. The following patch