Tcl Source Code

View Ticket
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.
Ticket UUID: 3386417
Title: errorstack leaked on syntax error
Type: Bug Version: obsolete: 8.6b2
Submitter: msofer Created on: 2011-08-04 23:03:38
Subsystem: 47. Bytecode Compiler Assigned To: msofer
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2011-08-11 04:40:31
Resolution: Fixed Closed By: ferrieux
    Closed on: 2011-08-09 17:03:01
When the compiler bails out on a syntax error, the errorstack is leaked. For example, for the script

  namespace eval x {set syntax {}{}}

valgrind reports

  0:  malloc (vg_replace_malloc.c:236)
  1:  TclpAlloc (tclAlloc.c:705)
  2:  Tcl_Alloc (tclCkalloc.c:1046)
  3:  Tcl_NewListObj (tclListObj.c:225)
  4:  Tcl_CreateInterp (tclBasic.c:542)
  5:  Tcl_Main (tclMain.c:639)
  6:  main (tclAppInit.c:84)
-- foo: ==9070== 3,248 (48 direct, 3,200 indirect) bytes in 1 blocks are definitely lost in loss record 136 of 136
User Comments: ferrieux added on 2011-08-11 04:40:31:
As a post-bug-mortem comment: I wrote a page on the wiki summarizing all I've learnt while solving this one:

     Leak Hunt (C level)

it also gives the basic 'valgrind recipe' and mentions the --enable-symbols=all pitfall dgp observed

ferrieux added on 2011-08-10 00:03:00:
Committed to HEAD after a quick chat.

ferrieux added on 2011-08-09 06:13:21:
File Added - 420603: errorstackloop.patch

ferrieux added on 2011-08-09 05:44:58:
OK, got it: what we are witnessing is a cycle in the reference graph. Indeed errorstack is special in that it records, as true references, all the command lines in the stack, including the [namespace eval x OFFENDINGSCRIPT]. As a consequence:
    - options(-errorstack) holds a reference to OFFENDINGSCRIPT
    - OFFENDINGSCRIPT's BC holds, as a literal, a reference to options()

So, the reason why TclCleanupByteCode is not called on the syntax error, is the above cycle.
So the whole cycle gets unreachable, and valgrinds just selects one as "definitely lost" and the rest as "indirectly lost".

The cure is simply to remove the (compile-time) errorstack from the options dict when processing a syntax error.
Patch to follow.

dkf added on 2011-08-08 19:57:01:
I'd guess that the problem is elsewhere; Valgrind's really operating on machine words (it's a machine-code-level annotation tool applied to memory analysis). I've seen times when I thought it was a problem in valgrind and where it turned out to be somewhere else _in my code_ than the place I expected. I'm resigned to such things these days, but it's one of these hard-to-believe got-to-see-it-for-yourself matters...

ferrieux added on 2011-08-08 01:59:58:
To clarify: I'm aware that a type cast is a source-level abstraction, and that all pointers are machine words.. but not sure of the amount of type info that -g brings into the executable. Pulling at straws...

ferrieux added on 2011-08-08 00:36:05:
Hum, it turns out that hashtable values are stored in the "clientData" field of Tcl_HashEntry, with a cast (to void* or int*)... I dunno how smart Valgrind is about casts, but could a pointer type mismatch block its path-walking ?

msofer added on 2011-08-07 19:54:22:
Redefining exit before the error (good catch, thx!) now shows the leak here in all script-input modes. Ie, not the same that you are seeing ...

Another funny observation: replacing 'namespace eval x' with 'if 1' (both insuring bcc'ing) removes the leak too.

ferrieux added on 2011-08-07 18:33:30:
Anyhow, I'm still wondering about literals' end of life. A watchpoint on the options dict object's refcount shows it never goes below 1, nor is TclReleaseLiteral called on it. Could it be subject to a kind of "fast erasure" that would deplete its hashtable behind the scenes ?

ferrieux added on 2011-08-07 18:09:12:
Ah, ok !
Also note that your "proc exit args {}" is ineffective when placed after the barfing [namespace eval] ;-)
So basically, two things are equivalent regarding this leak, once -DPURIFY is set:
  - apply the patch
  - redefine exit (and do it before erroring out ;-)
However, with either method I cannot reproduce the difference between stdin and arg script. In both cases I do get the "definitely lost".

msofer added on 2011-08-07 04:17:37:
not using the patch here

ferrieux added on 2011-08-07 00:21:05:
Sorry to push ahead with even more tangents.. but in my case:
 - with 2919042 's val4.patch and CLFLAGS=-DPURIFY, I do get the errorStack leak regardless of the kind of script passing method (stdin via <, stdin via pipe, or tclsh single arg).

- with val4.patch and no PURIFY, but just TCL_FINALIZE_ON_EXIT=1, I get many "possibly lost" but no "definitely lost", regardless of the script disposition.

ferrieux added on 2011-08-07 00:06:31:
By the way, though you're not telling, I guess you are using the patch from 2919042 (so that -DPURIFY forces finalization), right ? Since it seems useful for leak hunting, how about a commit ?

ferrieux added on 2011-08-06 23:49:14:
Hmm, I'd guess "interactive" is key. Though [history clear] was not enough to purge the offending literal in my tests. Will dig further.

msofer added on 2011-08-06 21:45:00:
Positively strange ...

To repro
  CFLAGS=-DPURIFY ./configure --disable-shared --enable-symbols


[email protected]:~/DEVEL/tcl-core/trunk/unix$ cat /tmp/foo
namespace eval x {set syntax {}{}}
proc exit args {}
[email protected]:~/DEVEL/tcl-core/trunk/unix$ cat /tmp/foo | /usr/bin/valgrind --leak-check=yes --num-callers=10 --show-reachable=yes ./tcltest &> /tmp/foo.res

Look for 'definitely' in /tmp/foo.res, you'll find it.

BUT: the leak does NOT appear if you do
   /usr/bin/valgrind --leak-check=yes --num-callers=10 --show-reachable=yes ./tcltest /tmp/foo &> /tmp/foo.res
So it may have something to do with io???

ferrieux added on 2011-08-06 20:07:32:
Cannot seem to repro with HEAD, even though patch 2919042 applied and TCL_FINALIZE_ON_EXIT set to 1, and single-line script with the above code passed as arg to non-interactive tclsh
Note that valgrind --leak-check=full gives many other (larger) leaks, but not this one.

Please give details about experimental conditions to help repro ;-)

ferrieux added on 2011-08-06 18:45:07:
Good point :) So maybe that literal thing is a red herring. However, what is exactly your way of exiting Tcl so that all finalizations execute ? Are you using the 2919042 trick ?

msofer added on 2011-08-06 17:50:38:
Unlikely: my valgrind script would show the literal or the options dict being leaked, but not the errorstack (as a ref to it would still be around, namely in the options dict). This is a 'definitely lost' struct ...

ferrieux added on 2011-08-06 17:46:32:
Quick analysis shows that an extra ref (to errorstack, but also to other return options) is kept in an options-dict computed by CompileReturnInternal. That dict is soldered into a literal (TclAddLiteralObj), so it looks like the lifecycle of that literal is the culprit.