Tcl Source Code

View Ticket
Login
Ticket UUID: ae09f6b190ceec31747572edb9f9be8bb03a592
Title: Mem leak in trunk
Type: Bug Version: 8.7+
Submitter: pointsman Created on: 2024-07-14 01:04:34
Subsystem: - New Builtin Commands Assigned To: nobody
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2024-07-15 19:49:21
Resolution: Fixed Closed By: oehhar
    Closed on: 2024-07-15 19:49:21
Description:
The commit https://core.tcl-lang.org/tcl/info/bacfccf1e2cbab78
introduced a memory leak. While I understand the reasoning for
[bacfccf1e2cbab78] I so far not understand why it had this unfortunate
"side-effect".

The builds I use for this are all made with -DPURIFY.

I'll append the output of a "make valgrind" run with
[95f9e8176ed0b552] - the parent commit of bacfccf1e2cbab78 - and the
output of the same with current trunk as of writing ([08603ee649]).
The output of "make valgrind" with [bacfccf1e2cbab78] looks similar to
the one of [08603ee649] (current trunk).

For a short verification you don't need the whole time consuming run
of the test suite under valgrind. With [bacfccf1e2cbab78] and under
linux 

make valgrind TESTFLAGS='-file load.test'

it returns (amongst more details)

    definitely lost: 80 bytes in 5 blocks
    indirectly lost: 176 bytes in 11 blocks
      possibly lost: 0 bytes in 0 blocks
    still reachable: 0 bytes in 0 blocks
         suppressed: 0 bytes in 0 blocks

With [95f9e8176ed0b552] - the parent commit of bacfccf1e2cbab78 - and
the same

make valgrind TESTFLAGS='-file load.test'

it returns

    HEAP SUMMARY:
        in use at exit: 0 bytes in 0 blocks
      total heap usage: 383,994 allocs, 383,994 frees, 43,997,079 bytes allocated
    
    All heap blocks were freed -- no leaks are possible

This can easily be "exploited" by a script. Any Tcl 9 binary extension
will do it (I'd suppose). In the example below I used tdom
de5237af97f2c1c (trunk at the moment of writing this). Run (your moral
equivalent) of

load ./libtcl9tdom0.9.4.so
while 1 {
    interp create childinterp
    load {} Tdom childinterp
    interp delete childinterp
}

with a [bacfccf1e2cbab78] tclsh and watch the process somewhat slowly
but steady growing. If I run that script with a tclsh with the parent
commit to that the process doesn't shows this. (At least with mem
builds.) Neither do I see the problem with 8.6
User Comments: oehhar added on 2024-07-15 19:49:21:

Sergey, great work ! Could you look into ticket [a7b7dd7927] which reports a segfault supposed to be caused by the fix?

Thanks for all, Harald


sebres added on 2024-07-15 01:22:57:

Fixed in [24ace090aad63399] for 8.7 and merged to 9.0.


sebres added on 2024-07-14 23:57:19:

The reason for the side effect is pretty simple - LoadCleanupProc uses Tcl_GetAssocData internally here and because the hash entry is already removed, it gets NULL and so the clean-up doesn't invoke unload below and therefore leaking.

Strange is that this was never noticed in 8.6, which was not affected by [34870ab5756911d1] and the hash entry removal were there before the call of the clean-up proc initially.

I'll try to fix it if I identify all affected branches.


Attachments: