Ticket UUID: | aaa84fbbc59fc039788c080a8cd32d2f73316db9 | |||
Title: | zipfs info on mountpoint of executable returns zero offset in field 4 | |||
Type: | Bug | Version: | ||
Submitter: | juliannoble2 | Created on: | 2024-10-14 15:46:42 | |
Subsystem: | - New Builtin Commands | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Severe | |
Status: | Closed | Last Modified: | 2024-12-07 16:57:22 | |
Resolution: | Fixed | Closed By: | apnadkarni | |
Closed on: | 2024-12-07 16:57:22 | |||
Description: |
Documentation for tcl9 zipfs info says: As a special case, querying the mount point gives the start of the zip data as the offset in (4), which can be used to truncate the zip information from an executable. Tested on some of my own windows builds and also on some beta tcl9s from magicsplat. zipfsinfo //zipfs:/app tclsh90.exe 0 0 0 Confirmed with Schelte in chat. Sorry to harp on about offsets - but I think it's important. If you mount a zip that has a tcl script prefix - it shows zero for field 4 if the offsets are relative to the file (which is what zipfs mkimg seems to have changed to do in 2021) If you just create a zip which is catted (ie no offset adjustments therefore offsets are relative to the archive) - you will get the offset info in field 4 to aid in determining where the exe/zip split occurs. I believe this is a rather difficult problem to determine the split if you use file based offsets - as you probably have to scan for pk\3\4 (local file header) entries - which could conceivably get false positives from in the exe/script data. Also - there can legitimately be file entries which aren't pointed to by the CDR so you can't just enumerate all files and look for the lowest offset - though at least it can get you in the ballpark If it was kept as simple unadjusted offsets - it's just a matter of doing a simple calculation - which tclZipfs.c already does. Trying not to add too much noise around this - but the calculation of the offset as zero as returned by 'zipfs info' might explain why in ticket https://core.tcl-lang.org/tcl/tktview/fc65ff1b663fd16e7582 .. the exe was missing from zipfs mkimg output. It's possibly an interaction with using 'zip -A' to change offsets to file based. It's my guess that the fix in that ticket to use file-based offsets broke zipfs info - and leaves us with the harder problem of scanning possibly large exes and zips for the split. | |||
User Comments: |
apnadkarni added on 2024-12-07 16:57:22:
Fixed in [3d62ee540d] apnadkarni added on 2024-12-04 06:50:45: For the record, below is a summary comparison of the current implementation of zipfs and Julian's patched version. Currently, the Unix autoconf build appends the zip file to the executable/shared library and then does a The Windows nmake on the other hand uses Tcl's own The difference shows up in two ways. Some external utilities, like
Note this does not affect actual operation of the unzip command. Also, unlike The problem with the autoconf build however, is that
Thus The patch from Julian, in conjunction with the removal of the
where the last element is the offset for splitting the wrapped binary. Even though this generates the warning from apnadkarni added on 2024-12-02 06:28:50: I tend to agree with Julian on this. My preference would be for Tcl's zipfs facilities to be internally consistent and complete, particularly because (a) external utilities do not seem to agree with each other either, and (b) as it is Tcl zipfs does not handle several common zip format extensions making any thought of making Tcl a "general purpose" zipping utility a non-starter anyways. Will wait for comments from others. juliannoble2 added on 2024-12-02 05:07:33: realizing your question was more specifically about direct unzipping. Yes infozip's unzip shows external preamble data as a warning on stderr and gives a nonzero exit code. There are various behaviours for other zip tools, including zero exit code with no warnings for either format. Notwithstanding the long unix and tcl history of driving the moving-target of external utilities programatically - is the case of calling out to a specific external zip tool against the combined-archive something we want/need to design for in this case rather than relying on the tcl zipfs code? juliannoble2 added on 2024-12-01 20:32:40: open source tools zip, 7zip, nanazip and peazip can view/list and unzip a catted file directly. (7zip and nanazip are cross platform and can directly edit/delete contents too) 7z when listing just puts an extra line in the header: "offset = nnn..." rather than a warning. Yes, there is a tradeoff in that not all tools can deal with a combined file in catted format. (most can read/extract - fewer can edit directly) The tradeoff goes in the reverse direction too sometimes, in that for example 7zip,nanazip can't directly edit when in the current adjusted format but can when unadjusted. It makes sense that the motivation for the adjusted 'zip -A' format may have been so that (most) zip tools could work directly with a zipfs extended executable or lib - though I don't see it mentioned as a goal in tip 430. I don't think that attempting to make the combined file work seamlessly with any and every zip tool (not fully attainable in either format) should take precedence over the fix - but I understand the attraction of supporting a wider set. This fix makes working with the combined files a bit more like working with other vfs attachment systems such as kit or cookfs; or alternatively the convenience of direct editing would require specific tools such as 7z or nanazip. All I can say is that in my opinion working with tcl tools and being able to split the zip via the offset, or copying out to folder, or working with any zip tool on just the zip part, then re-catting or rerunning zipfs mkimg ; should be enough - and is better for zipkit builders/programmers. apnadkarni added on 2024-12-01 15:49:59: Julian, Your patch is now merged into the apn-julian-zipfs-patch branch. No test failures and some basic experimentation (without passwords or other preambles) shows zipfs info returning correct values. One question I had was with regard to the use of zip -A. As Jan also mentioned in the other related ticket, the image built with zip -A shows no warning with
On the other hand, the image (x) built with
The Windows build, which uses minizip, not zip -A, also shows the same warning. So my question is really, changing to using zipfs mkimg as you suggest (as equivalent to cat file >> stub.exe) in place of zip -A would result in unzip warnings, wouldn't it? Is aiming for compatibility with a variety of tools feasible? juliannoble2 added on 2024-11-30 16:09:54: > Julian, what's your opinion on the importance of this fix. Is it something that needs strong consideration for 9.0.1 ? I think it's important for longer term compatibility regarding tools that are developed for building/rebuilding kits and being able to rely on consistent behaviour. The more tcl executables released with zipfs attached (kits etc); the more workarounds need to be done in future to detect how the combined executable preamble/zip archive was made and how to handle it. (may be as simple as just reading a zero offset result from 'zipfs info' and a tool deciding to error out) I can't claim it's critical that it makes it makes it into 9.0.1 - but ideally I would hope there is agreement that it would come very soon. Little used features such as the password that can optionally be used with zipfs mkimg should be naturally considered part of the 'internal preamble'. Also application specific zip headers such as certain non-standard zip signing mechanisms are part of the 'internal preamble'. (Also valid zips with deletions done via CDR only can result in a form of internalized preamble consisting of dead local file header entries) The current state of using 'zip -A' and existing mkimg behaviour to mix our tclsh/script in with a (possibly already existing) internal preamble means there is information loss regarding where the split lies that can only be determined in case-by-case application specific ways or in some cases complicated guesses. Even without password or existing zip preambles - the fix will mean tools will have a much easier time determining the bytes corresponding to the tclsh preamble. I don't believe it's quite as important for now regarding shared libs that may have a zipfs attached - but even then, being able to quickly determine the dll/so part via 'zipfs info' is a nice to have, and it would be best long term that this too can be relied upon. apnadkarni added on 2024-11-30 13:52:51: Julian, what's your opinion on the importance of this fix. Is it something that needs strong consideration for 9.0.1 ? apnadkarni added on 2024-11-30 13:52:20: Julian, what's your opinion on the importance of this fix. Is it something that needs strong consideration for 9.0.1 ? juliannoble2 added on 2024-11-24 11:08:44: patch info: Makes archives created with mkzipfs compatible with those made simply by 'cat file.zip >> tclstub.exe' ie exe-preamble is external to zip I believe androwish got this right. It is also compatible with the way TOBE works (a mentioned influence in the TIP) and the description of how to construct a combined exe/zip in TIP 430. zipfs info <mountpoint> will then report the offset correctly and allow easy splitting of the exe portion of a supplied kit. This does not stop existing exezips from working. It is inline with TIP 430 - whereas the existing situation is a change from that, and I have not yet seen any justification for why the build system (using zip -A) and mkzipfs internalizes the preamble and breaks 'zipfs info'. The alternative would be to adjust the code to make zipfs info work out the offset for internalized preambles. As mentioned in my other posts I think that is more complicated and error prone and would require more documentation. I believe in concert with this the build system should stop using 'zip -A' - which would also possibly give greater compatibility for systems were the system zip doesn't have an offset adjustment flag, or just doesnt' work (e.g on windows it works when run from mingw - but not directly from cmd.exe) juliannoble2 added on 2024-10-30 03:50:13: I'll attempt to provide some more context regarding this issue. The zip archive format (or zip 'container format') is tolerant of 2 ways of prepending data that isn't part of the archive contents as far as what is extracted/listed. I'll refer to such data - which may be for example a script or an executable (or in less common cases a signature - or other application specific data) as the 'preamble'. This could be an 'external preamble' - e.g as would be created by the method noted in tip 430 by just catting an exe and a zip together. It could be an 'internal preamble' - which might be created by the same method - but then 'adjusting' the zip offsets such that the preamble is part of the range of data covered by the offsets. e.g using something like 'zip -A' on it. 'zipfs info' can currently only return the offset for the mountpoint as documented in the Tcl zipfs man page if it is an external preamble. If adjustments have been made - zipfs info will return zero for the offset. In some sense this is correct. By making the preamble internal - the zip archive portion does in effect begin at 0 - the start of the whole file. That zipfs mkimg - and the build system (using zip -A) create an internal preamble by default is the cause of this. This default to 'internal preamble' is then at odds with the documentation, and the 'zipfs info' result cannot as claimed be used to 'truncate the zip information from an executable' Note that while in nearly every case external zip tools can extract zip contents from a preambled file whether internal or external - there are differences in direct writability etc - and some tools exercise different codepaths based on the file extension. A file that is named with a .zip extension - is generally expected by tools to be an archive with either no preamble, or one that is internal. pkware are the supposed stewards of the format - having released various appnotes giving a somewhat underspecified documentation of it. Although the original author Phil Kats released the format to the public domain - it is unclear to what extent versions documented by pkware after his death contain copyrighted extensions especially related to encryption. This is probably why various other projects that have made use of the zip format have tended to restrict the features used. pkware put their self extractor exe as an 'internal preamble' - and their tools tend not to support files with arbitrary extensions on the commandline (they assume or require .zip or .exe) Other tools will generally accept e.g file.kit, file.jar, file.apk etc and list/extract contents ----------- Catting a Tcl exe or script preamble to a zip file which already has an internal preamble will result in a file with both an external and internal preamble. This has at least one possibly rare legitimate usecase There is at least one zip signing tool which uses an internal preamble instead of other mechanisms to store the signature. (Other mechanisms include putting it in the comment(seems dubious way to do it) or in the archive itself as for example .jar files do) Catting for example a tcl executable to an existing sfx would also result in a mixed external/internal preamble. This is another unlikely usecase - in that while the internal sfx header seems to do no harm - its only utility is if the tcl preamble is later split off and the resulting zip is wanted to remain an sfx. Generally its probably just a waste of space. If a zip kit/exe with both an external and internal preamble is adjusted to make all the preambles internal - there is information loss regarding where the split occurs between the preambles. (Possibly not a case particularly cared about/important in the Tcl zipfs ecosystem but an undocumented edge case nonetheless) If a tcl exe or script preamble is adjusted to be internal - the split between preamble and zip is also harder to ascertain. It's not an issue for extracting/reading the zip contents - but for determining where the preamble ends. (easy enough if you know there is just one script delimited with \x1a - not so easy for cross-platform formats of executables or application specific data etc) Internal preamble splitting generally relies on using the first file record pointed to by the zip-archives Central Directory. 1) as a reduced case for just a zip file with no contents but just a comment - there isn't even a CDR (just an end of central directory record) (zipfs can't mount an empty zip, vfs::zip can - but in any case read-only - so probably not relevant to Tcl as it stands) 2) existing zip archives can have either internal preambles, or file records that are unreferenced by the CDR This makes splitting hard. Not an issue just for extracting/reading, or for just running or self extracting - but without a functional 'zipfs info' users have to parse the zip themselves to split the Tcl preamble. Such parsing is tricky/error prone even without unusual cases - and presumably what zipfs info's offset result was supposed to avoid the need for. Tcl's zipfs can currently split an external preamble no problem and reports it in 'zipfs info'. To fix 'zipfs info' for the current default of 'internal preamble' - some not completely reliable CDR mechanism will need to be used. It could be made 'completely reliable' by documenting the restrictions Tcl places on zip archives (specifically no comment only or empty zips, and all files must be referenced by (and in order) the CDR) Opinion: The current default (which I don't believe was the original?) to 'adjusting offsets' and using internal preamble is unnecessary, more complicated, more restrictive. (and currently not fully supported as evidenced by 'zipfs info' output) If the intention is to stick with the default (by zipfs mkimg and build system) internalizing of the Tcl preamble into the zip offset range - a decision needs to be made on what zipfs info reports. While technically correct (there is no offset - it was adjusted away) - it's not useful, and not coherent with documentation. If sticking with this default - zipfs info should still report the correct offset as it currently does for the simpler and IMO more sensible external Tcl preamble situation. Regarding use of external zip tools on preambled zips: The situation on windows I believe favours external plain catted preambles for files not named with the .zip extension - but there are many variations in behaviour across platforms and tools. 7zip is available on all platforms and works well with external preambles. |
Attachments:
- tclZipfs.patch [download] added by juliannoble2 on 2024-11-24 10:52:38. [details]