Tcl Source Code

View Ticket
Login
Ticket UUID: a8e4f76ce498ad648134e0b6867c359949055388
Title: load library (dll) from zipfs-library causes a leak in temporary folder
Type: Bug Version: >=8.7
Submitter: sebres Created on: 2019-04-11 11:14:29
Subsystem: 38. Init - Library - Autoload Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Severe
Status: Open Last Modified: 2024-12-02 13:50:31
Resolution: None Closed By: nobody
    Closed on:
Description:

8.7 seems to "leak" loaded from zipfs libraries in temp-directory (at least on windows), e. g. each invocation of shell like below creates a new temp-directory with "tclreg13.dll" inside and which is still remained after exit.

echo package require registry; puts [glob -directory $env(TEMP) TCL*] | tclsh

Is it missing an exit-handler (to delete this)? Could the library not be loaded directly from memory? (I know several ways to do this on win/linux).

BTW. It causes several warnings in tcltest-suite like Warning: files left behind: ... cmdMZ.test: TCL00002f50
So test like this produces a warning:

tcltest -singleproc 1 -file cmdMZ.test
And setting of a TCL_LIBRARY to local library-directory (like `TCL_LIBRARY="$tcl/library"`) would of course prevent this.

User Comments: dkf added on 2024-12-02 13:50:31:

Theoretically we could keep a list of known temporary DLLs and attempt to delete them on exit. Practically, I don't think that will work because the files will probably still be memory mapped into the process at that point, and I'm not at all convinced that it would be safe to undo that.

Also, AV systems may "modify" (break) the semantics of system calls in this area (mostly by adding quite a long delay from an operation being requested to it actually completing). Boo.

An application could fix this (or at least work around it by post-exit and on-reboot scripts) especially if we give an option to put the temporaries in a nominated subdirectory, but the Tcl library really can't do the whole of it itself.

Loading from memory would dodge much of this mess.


apnadkarni added on 2024-12-02 09:53:57:
To add to the last post, there are also flags to delete the file after process exits. However, they conflict with the flags LoadLibrary uses open the file.

apnadkarni added on 2024-12-02 09:52:41:
>Are the temporary files cleaned up on reboot? I know there's a Windows API for scheduling that to happen.

Needs admin privileges.

dkf added on 2024-12-02 09:19:18:

Are the temporary files cleaned up on reboot? I know there's a Windows API for scheduling that to happen.


chw added on 2024-11-27 15:23:07:
Harald, no, I really don't have a problem. Even better, my little
corsair undroidwish will sail with MemoryModule in the next LUCK
edition, which is only few days away.

oehhar added on 2024-11-27 14:37:18:

Hi Chrstian, please stay in this boat. It is a great ship and we will see, if it sails or not.

This will be after 9.0.1, which is expected end of this year.

We are still in bug fixing mode and great new features have to wait for the next year.

Take care, Harald


jan.nijtmans added on 2024-11-27 13:59:40:

> No one is disputing that loading from memory would be attractive. The question is whether it can be implemented robustly and correctly on platforms that do not natively support it and do not provide api's that allow for its safe implementation.

Amen. B.T.W: I think the answer to this question is simply: YES this can be done. I went through the MemoryModule pull requests. There are 9 of them. 4 give solutions to known bugs, I now merged those. This means that SEH exception handling should now work on Win64. It doesn't mean all dll's work: Dll's protected with WinLicense are still out-of the question, I'm sure there are more. Two other pull-requests are nice-to-haves, we don't need them. The remaining 3 look invalid to me.

> 2) impossibility to use nested DLL's (if some tcl-module needs another DLL's as dependencies to work, and want to provide them together within the executable). > ... > And the latter is to neglect, because even current implementation doesn't allow it (I don't think the dependencies will be automatically unpacked to TMP-folder now).

Indeed, this doesn't work now. But using MemoryModule, it could be made to work: there is a callback when a dll needs to load another dll and when it needs a symbol or resources from it.

TIP time


chw added on 2024-11-27 12:45:36:
OK, Ashok, I take this that you will vote against it, should
it ever need be TIP'ed. So be it. And I'll live with it.

sebres added on 2024-11-27 12:32:34:

From one side, I'd also prefer a robust implementation.

From other side, the unpack to temp-folder is also ugly (no matter once or per process) for several reasons, like troubles with rude AV or policies, or necessity to clean it after exit.

Particularly because the sole robust implementation is already here and it is called static linking.
And it has basically only two cons:

  1. some toolchain is necessary to create a ready executable;
  2. impossibility to use nested DLL's (if some tcl-module needs another DLL's as dependencies to work, and want to provide them together within the executable).

However the former is solvable using different ways, e. g. offline or online tool to do that (e. g. similar to kit-creator).
And the latter is to neglect, because even current implementation doesn't allow it (I don't think the dependencies will be automatically unpacked to TMP-folder now).

I don't see any other reason why {load $dir/some-dll.dll some-module} (with internal load from mem by some hackish way) may be more preferable than simplest {load {} some-module} (with load direct from executable).

Excepting probably the factor "compression", but firstly the binaries are usually very bad compressible, and if not - there are special tools like upx, which can be used to compress the whole executable.


apnadkarni added on 2024-11-27 10:44:30:

Christian,

I respectfully suggest that the points you list are orthogonal to what is being debated.

No one is disputing that loading from memory would be attractive. The question is whether it can be implemented robustly and correctly on platforms that do not natively support it and do not provide api's that allow for its safe implementation.

Symmetry with other platforms is not sufficient justification in my opinion.

/Ashok


chw added on 2024-11-27 07:40:58:
OK, Ashok. Let me clarify (and let us leave Wine and build-time out for now).

* When does memory loading occur?

A shared object shall be Tcl_LoadFile()'d and is not in the native filesystem.
Otherwise the normal platform specific loader is used.

* Why is memory loading attractive?

Because it requires nearly only RAM, not much else. RAM is cheap nowadays.
And there is plenty, usually.

* Which platforms provide memory loading OOTB?

MacOS.

* Which platforms allow for safe implementation of the MacOS functionality?

POSIX when supporting dlopen(), shm_open(), and mmap(), or Linux >= 3.17
with dlopen() and memfd_create(). In both cases the shared object is
copied into a named shared memory which then is dlopen()'ed. On FreeBSD
there's fdlopen() which allows to use the open shared memory instead of
the name of the shared memory, even.

* Which platforms allow for immediate deletion of a file in native
  filesystem after it has been dlopen()'d?

POSIX. Windows most likely only in very specific contexts, i.e. normally not.
That seems to be the main reason to have a temporary directory for DLL's
loaded on Windows.

* Why do I prefer MemoryModule on Windows?

See above, for symmetry. And RAM is cheap on Windows, too.

* What are alternatives?

If it would be possible to have a comparable trickery using named shared
memories plus LoadLibrary() on them on Windows, that would be even better.


Further comments:

I see your point: people which create a huge multi gigabyte C++ AI DLL
blob which they want to deliver as part of a ZIPFS bundled tclsh or
wish app. For now they have already to wait for ZIP64 support. OTOH,
as long as we thoroughly document the limitations of memory loading for
the Windows platform, they will have to understand that wrapping in this
case is not supported. And that a persistent install mechanism is the
better solution in this case anyway, since load time of that blob at
startup has in any implementation of memory loading a negative impact
on user experience.

juliannoble2 added on 2024-11-27 03:29:38:
Regarding build time options - If it's to go ahead, I'd rather see a modal runtime switch along the lines of the way 'package prefer' works so that if a particular library was problematic it could be worked around with that.

apnadkarni added on 2024-11-27 02:58:02:

Christian,

I am not sure as to the reason for your preference for MemoryModule over simply deleting the directory through a subprocess. Can you clarify?

I do not also see the environment as being controlled. Are we going to limit what kind of extensions people write? For instance, there are more and more C++ libraries, particularly in the numerical processing space, that (some day!) people might want to wrap. Are we going to define what will work and what won't?

If in fact Wine does not work (if I interpret your statement correctly), this is only an indication of how dependent the code might be to a specific undocumented implementation.

Not in favour of yet another build time option, this or any other. The number of possible build configurations has already become ridiculous in terms of testing. (That's an orthogonal opinion to this core issue but since you brought it up...)

But whatever the broader community decides...


chw added on 2024-11-26 18:12:20:
Ashok, I prefer the MemoryModule solution for the simple reason
that it is used in a controlled environment (at least in theory).
Or do we really mount arbitrary ZIPs containing evil DLLs?
You might argue, that using ffidl or cffi that is not always
ensured but then there's the possibility to finer control the
capabilities of Tcl_LoadFile() by adding flags to exclude the
MemoryModule if so desired.

Since my early tests shortly after Jan added his branch were a
full success on a real Win32 system (Wine not so much) with all
the DLL stuff going into the undroidwish of the day, I like the
MemoryModule approach (and yes, I build all my crap in a controlled
way with MingW on Linux always trying to minimize dependencies).

And finally, we could make this feature build-time optional.

apnadkarni added on 2024-11-26 17:30:50:

It is difficult for me to accept the premise that because MemoryModule works for one particular use case, it will work in all cases across all versions of Windows. Sure I could build a DLL using SEH and demonstrate it loads from memory. That does not give me confidence it can handle all exception records. Likewise, a declspec(thread) may work for a native type in a single DLL loaded in a single thread but will it work for multiple declspec(thread) for non-native types that have constructors across multiple DLL's under a heavy thread load?

I'm afraid I am very skeptical of emulating the Windows loader. A few experiments of specific cases will not convince me MemoryModule gets everything right and I'm not in favour of code that works "most of the time". Perhaps it's my ignorance of the area that makes me hesitant.

In any case, I've added a branch apn-bug-a8e4f76ce4 that simply spawns cmd to delete the directory on process exit. Less elegant but far less risky imo.

As an aside, my TEMP directory has 5000+ directories at the top level after deleting all TCL directories. Clearly Tcl does not need to be singled out in this respect.


jan.nijtmans added on 2024-11-26 10:04:44:

> What happens if the DLL contains resources accessed by the code? Will LoadResource accept the handle passed by MemoryModule to the DLL entry point?

I can now answer that one. I made an experiment, zipping tcl9tk90.dll into tclsh90.exe, together with a hand-crafted pkgIndex.tcl file. See attached image.

The feather is a resource inside tcl9tk90.dll, it is accessed and displayed correctly in the top-left corner of the window. The coffee_mug is another resource. When moving over the button, the coffee_mug cursor is visible (unfortunately, screen-shots don't capture cursors). So, the answer to this question is Yes. This demonstrates that LoadCursorA() is accepting the handle returned by Tk_GetHINSTANCE().

I also went through the Tk demos's: all of them are working fine.

C:\Users\jan.nijtmans\workspace\tcl9.0\win>.\tclsh90.exe
% package require tk
9.0.1
% button .b -text coffee_mug -cursor coffee_mug
.b
% pack .b
% info loaded
{//zipfs:/lib/tcl/tcl_library/registry/tcl9registry13.dll Registry} {//zipfs:/lib/tcl/tcl9tk90.dll Tk}
%


apnadkarni added on 2024-11-25 06:17:18:

To respond to Jan's post,

The issue is not with Windows PE format changing, it has to do with how the Windows loader interprets the content of the records and to what extent MemoryModule faithfully implements that behavior.

  • Does it process SEH records in the PE to set up exception handler chains?

  • Does it call C++ constructors for globals, and in the correct order?

  • Does it process declared TLS variables (_declspec(thread) decorations) and arrange for memory to be allocated for them at every thread creation? Will it pin the DLL in memory as required?

  • Does it handle manifest records in the DLL? For example, a DLL may require a specific version of a dependency specified via a side-by-side assembly manifest record.

  • Does it deal with differences within Windows versions? TLS handling for example has changed and the implementation in the MemoryModule would have to pick the correct one depending on OS version.

  • What happens if the DLL contains resources accessed by the code? Will LoadResource accept the handle passed by MemoryModule to the DLL entry point?

This is just what I can think of based on my limited knowledge of Windows loaders. There are likely other concerns.

I have not looked at the MemoryModule code and even if I did I probably could not tell if it does all the above. Based on the tickets though, I think it does none of it.

The comments regarding Tcl C API not using SEH is irrelevant. Extensions can internally use SEH.

For TLS, using Tcl's TLS API is not an alternative. Even ignoring the performance issues, it would require extensions to (a) change implementation and (b) not even be feasible in some cases as the API's and semantics are not the same. And as a practical matter, tclcurl uses libcurl which in turn uses another dozen or more libraries. How is it even possible for these to be changed to use the Tcl TLS API?

The stance that if the mechanism does not work for some extension, we can just mark the bug as not supported does not sit well with me. For starters, with TLS stomping, bugs would show up as memory violations (best case) or data corruption (worst case). Just tracking it down and pinpointing MemoryModule would be considerable work. Think about the tclcurl example above. Remember that the inclusion of an extension within the VFS is not within the extension writer's control but they are the ones who will expend effort on debugging.

Finally, as a matter of principle I am opposed to introducing features into the core that will work some of the time for some of the extensions. That is not a path to robustness.

I understand the risk/reward is all a matter of opinion so if majority of folks want it, so be it. I do think something of this nature should have a TIP (targeting 9.1 and not 9.0.1) and not be just treated as an implementation change or bug fix. That will allow a full range of tests.


marc_culler added on 2024-11-23 21:21:06:
>> depends on undocumented ...

>Yes, some macOS code does as well.

Well, as far as I know the undocumented macOS code was provided by Apple.  That
doesn't seem like the same thing to me.

jan.nijtmans added on 2024-11-23 12:00:26:

Ashok, I fully understand your reservations. However:

> depends on undocumented Windows

Yes, some macOS code does as well. In this case, it depends on the Windows PE format, which didn't change in a long time and is not likely to change in the future. The fancycode implementation is already more than 10 years old, and it still works with Windows 11. That tells something.

>has some serious unresolved tickets and shortcomings in common situations where the DLL being loaded uses SEH, thread local storage, has manifests etc.

SEH won't work with the Tcl C-API anyway. We can require Tcl's own thread local storage to be used. The presence of manifests in the dll can only fail if the dll is actually trying to read the manifest content.

> I don't think any of these are likely to be addressed as the software seems to be abandonware.

If a bug is discovered, we don't need to address it. The simple answer is: not supported. There are bug-fixes available in git clones. If people submit a bug-report about it, that's fine. If a crash happens inside MemoryModule, it means some situation is not handled. The bug can be worked around then by letting OpenMemoryModule() return NULL, then the fallback of copying the dll to the temp directory will take place.

Considering all of this, MemoryModule appears to work well (at least with dde and registry). If there are problems, there is always the fallback of statically linking too. I don't see much risk for us.


apnadkarni added on 2024-11-23 06:24:23:

I have some reservations about the inclusion of MemoryModule assuming it is the same one as fancycode at github.

I had looked at it some years back for this exact purpose for 8.6 tclkits, but rejected it due to the fact that it depends on undocumented Windows behavior, has some serious unresolved tickets and shortcomings in common situations where the DLL being loaded uses SEH, thread local storage, has manifests etc. I don't think any of these are likely to be addressed as the software seems to be abandonware.

I would be hesitant to introduce it into the Tcl core.

/Ashok


chw added on 2024-11-18 18:55:31:
Wow! Jan, your proposal in [4e88af7df7] is perfect IMHO.
But please fix the missing "return TCL_ERROR" in the error case
of TclpLoadMemory() anyways.

Now, I wonder if the MPL of MemoryModule.[ch] could be another
roadblock in the license discussion. Is it compatible with BSD?

jan.nijtmans added on 2024-11-18 15:15:28:

Good idea, to load the library directly in memory:

Proposed solution [b74bd939ad1277f6|here]


jan.nijtmans added on 2024-10-08 13:36:50:

Dup of [6ce3c0de9d]


Attachments: