Tcl Source Code

View Ticket
Login
Ticket UUID: 1d26e580cfcb7ac096cfe9cd76c87f39415120c2
Title: safe interp can't source files with BOM
Type: Bug Version: 8.6
Submitter: juliannoble2 Created on: 2024-06-06 19:13:28
Subsystem: 33. Safe Base Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2024-06-13 12:18:40
Resolution: Fixed Closed By: oehhar
    Closed on: 2024-06-13 12:18:40
Description:
Leaving aside the desirability of having a BOM on a utf-8 file - especially a package..
Standard Tcl interps have no problem sourcing files with a BOM.

Safe interps created with safe::interpCreate don't seem able to handle it.

I think it should be consistent in this regard if possible.

I encountered this with a non-public package that deals with BOMs in some of its data files - and it was dogfooding I guess, so the tm has one too. Can be worked around for the package itself - but may be an issue for making that package work in a safe interp with its datafiles too depending on whether source is used. (using something like the island concept on tcl wiki to allow limited access to a section of the filesystem)

e.g with a package file named bomtest-1.0.tm in current dir containing nothing but "package provide bomtest 1.0" and saved with a BOM.

    % tcl::tm::add [pwd]
    % package require bomtest
    1.0
    % safe::interpCreate x
    x
    % x eval {package require bomtest}
    invalid command name " package"
User Comments: oehhar added on 2024-06-13 12:18:40:

Ok, thank you. For -eofchar, that sounds resonable.

THanks for all the work, Harald


jan.nijtmans added on 2024-06-13 12:05:41:

Fixed [a7efbe19650e6b1f|here]

Changing the use of -eofchar is not a good idea IMHO. What if a file has a 100-byte script followed by \x1A followed by a 1Gb zip-file .... Then - with your proposal - the whole 1Gb will be loaded in memory. The advantage of -eofchar is that the reading stops when the character is encountered.

The use of -eofchar in output channels was a bad idea, that's fixed in 9.0. But I think -eofchar should be kept for input channels.


oehhar added on 2024-06-12 12:58:31:

Great work, thank you !

Some questions:

  • may the new lines be moved to line 994, just before the actual use of "contents" ?
  • I would also love to replace the "-eof "\x1A {}"" by a check on the read data for \x1A, like
   set pos [string first $content \x1A]
  if {$pos != -1} {set content [string range $content 0 $pos-1]

so we could remove -eofchar totally one day...

Thank you for all, Harald


jan.nijtmans added on 2024-06-12 12:38:21:

Proposed fix [f612d4acfa1f604a|here]

Remark: in the bugfix branch I added a BOM to "init.tcl" for testing purposes. This should NOT be merged to 8.6!


oehhar added on 2024-06-07 09:08:32:

I can confirm the bug on TCL 8.6.14 32bit and 9.0b2 on Windows.

I can confirm, that the space character in the log before "package" is the BOM.

I also tested, that it works without the bom -> yes.

The error stack of 8.6.14 below shows an explicit -encoding utf-8, while this is correctly missing in TCL9.0.

Log on 8.6.14

% cd {C:\test\bom_package}
% tcl::tm::add [pwd]
% package require bomtest
1.0
% safe::interpCreate x
x
% catch {x eval {package require bomtest}} e d
1
% set e
invalid command name " package"
% set d
-errorstack {
  INNER {invokeStk1 package provide bomtest 1.0}
  CALL {::safe::AliasSource x -encoding utf-8 {$p(:26:)/bomtest-1.0.tm}}}
-errorcode NONE
-errorinfo {
invalid command name " package"
    invoked from within
"source -encoding utf-8 {$p(:26:)/bomtest-1.0.tm}"
    ("package ifneeded bomtest 1.0" script)
    invoked from within
"package require bomtest"
    invoked from within
"x eval {package require bomtest}"} -errorline 1 -code 1 -level 0
% regexp {"(.*)"} $e m f
1
% set f
package
% scan $f %c
65279
% x eval {package require nobomtest}
1.0

Log difference 9.0b2

% set d
-errorstack {
  INNER {invokeStk1 package provide bomtest 1.0}
  CALL {::safe::AliasSource x {$p(:27:)/bomtest-1.0.tm}}}
-errorcode NONE
-errorinfo {
invalid command name "package"
    invoked from within
"source {$p(:27:)/bomtest-1.0.tm}"
    ("package ifneeded bomtest 1.0" script)
    invoked from within
"package require bomtest"
    invoked from within
"x eval {package require bomtest}"} -errorline 1 -code 1 -level 0

The two test files are attached.

Thank you for the bug report, Harald


Attachments: