Author: Harald Oehlmann <[email protected]>
Author: Ashok P. Nadkarni <[email protected]>
State: Final
Type: Project
Vote: Done
Tcl-Version: 9.0
Tcl-Branch: tip-602
Vote-Summary: Accepted 6/0/0
Votes-For: AK, JN, KBK, KW, MC, SL
Votes-Against: none
Votes-Present: none
Abstract
Tcl 8 supports Unix shell-style tilde substitution. This TIP removes this functionality in Tcl 9.
Rationale
The Tcl 8 treatment of ~
and ~user
leading components in file paths
passed as arguments to file related commands is convenient for interactive
use. However, the resulting behavior is insecure and error-prone.
Consider the naive attempt to clean out the /tmp
directory.
cd /tmp
foreach f [glob *] {file delete -force $f}
A file ~
or ~user
maliciously placed in /tmp
will have rather
unfortunate consequences.
In addition to being a source of security issues as above, tilde substitution is also inconvenient when writing robust file handling applications and packages. Attempting to process Mercurial repositories in Tcl for example, will generate unexpected errors.
To avoid the above pitfall, all commands that operate on files, such as open
,
file
have to check for ~
and prefix with a ./
to disable tilde processing.
On the other hand, display to the user or matching against a user supplied
pattern requires the ./
be not present. Thus glob-like operations have
to account for both cases.
Outside of the shells, this tilde expansion is not seen in any other commonly used languages, even on Unix. Thus programmers coming from other languages are not likely to be aware of the above pitfalls and the need for cumbersome workarounds.
Note this ambiguity in processing impacts use of utility packages as well, such
as the fileutil
module in tcllib
, making them unusable.
Although possibly rare in the Unix world, tilde-prefixed files are not uncommon on Windows systems. Examples include
- Files within Mercurial SCM repository storage (perhaps even present on Unix)
- Files created by Excel prefixed as
~$
- Directory under the Visual Studio installation of the form
~FC
,~IC
etc. - Files in the %TEMP% directory, not clear what application creates these.
- Font caches under AppData
Specification
Change in file path translation
File paths will no longer be subject to tilde expansion in any commands.
They will treat ~
as any other character. This includes commands that
operate on files, like open
, exec
, glob
as well as those operating
on file paths, like file normalize
, file tail
, file basename
etc.
The file pathtype
command will return relative
for tilde-prefixed paths.
The file split
command will not prefix a tilde-prefixed path component with
./
. Conversely, file join
will not strip a ./
prefix from an argument
starting with ./~
.
There are a few exceptions where ~
expansion will take place for
Initialization of
auto_path
from theTCLLIBPATH
environment variable will do tilde expansion on each path. Any expansion that fails because the user is unknown will not be included inauto_path
.Likewise, initialization of the Tcl module search paths from the
TCL9_0_TM_PATH
(and similar) environment variables will undergo tilde expansion. Again, any expansions that fail because of unknown user names will be excluded. Note that the commandstcl::tm::add
andtcl::tm::roots
will not themselves do any tilde expansion.The initialization of the
tcl_pkgPath
variable will undergo tilde expansion at start up time. This is necessitated by the MacOS configure's use to setTCL_PACKAGE_PATH
at build time.
New command file tildeexpand
A new file tildeexpand
command is added to alleviate compatibility issues and
help resolution of tilde-based paths present in configuration files etc..
The command takes the form
file tildeexpand PATH
If PATH
begins with the sequence ~
or ~USER
it is resolved relative to
the home directory of the current user or named USER
respectively. If
USER
is not a known user, an error is raised. If PATH
does not begin with
a tilde, it is returned unmodified.
In the case of ~
, the command returns the value of the HOME
environment
variable. An error is raised if this does not exist.
In the case of ~USER
, the command retrieves home directory of
the user by platform-dependent means (TclpGetUserHome
to be precise).
Both the above behaviors clone the 8.x resolution of tildes.
The command makes no guarantees about form of the returned path such
as the separators. Other Tcl commands like file normalize
etc. should be
invoked on the result if that is important.
New command file home
The functionality of retrieving the home directory is exposed through
the new file home
command. This takes the form
file home ?USER?
If the USER
argument is not specified, it returns the value of the
HOME
environment variable. An error is raised if this does not exist.
On Windows, any backslashes in the path are converted to forward slashes.
If the USER
argument is specified, it retrieves home directory of
the user by platform-dependent means (TclpGetUserHome
to be precise).
Both the above behaviors clone the 8.x resolution of tildes.
Retrieval of home directories can also be achieved with the
file tildeexpand
command so this command is not strictly necessary.
However, it is more intuitive to use on platforms where the use of tilde
for representing the home directory is not common.
Discussion
The comp.lang.tcl thread titled "User does not exists when file name start with ~" on 2021-05-06 had some relevant discussion.
From Don Porter
This is a much deeper issue than either that draft TIP or the posts here have
uncovered. The VFS layer has a problem not only with paths beginning with `~`,
but with all paths that have a prefix that can be claimed by a mounted
Tcl_Filesystem. The same ./ prefixing has to be applied to workaround
implications of this unfortunate design. A related matter is that prefixes and
patterns that determine [file system] assignments are not accomplished by a
registration, but by a round-robin game of hot potato. The design flaws are
large and deep. A good solution is a pretty major rewrite. This isn't a quick
fix.
Sometimes I think a good partial solution would be a rewrite that replaced all
the conditional branches that implement the `~` translation pervasively in the VFS
implementation, with a different strategy that made `~` translation available only
through a separately mounted Tcl\_Filesystem that claimed the path names
matching `~*`. In that revised strategy, more scripts and apps would have the
option of unmounting that Tcl_Filesystem to disable the feature.
Some history and additional information in ticket
<https://core.tcl-lang.org/tcl/tktview/2511011>
and probably other tickets I cannot find quickly now.
Although Don points to a broader problem, I think the specific issue with ~
can be selectively targeted relatively simply without a major rewrite. The
TIP addresses this.
There is also a wiki page dedicated to this issue:
https://wiki.tcl-lang.org/page/Tilde+Substitution
Steve Landers on the Tcler's chat suggested
On that basis I've been thinking about ways to warn people if their code relies
on ~ expansion. Something similar to what Apple do when they are in the process
of deprecating a feature. The idea isn't well developed but something like 9.0
warns if ~ found in a path with a way to turn off the warning, 9.1 doesn't warn
with a way to turn on the warning. And perhaps only warn if necessary - i.e.
expand the path and if it is different from the unexpanded then warn. But not
sure if that's practical.
Sergey Brester on the core mailing list had suggested a per-command switch. Nothing in the TIP precludes such a feature from being proposed in a separate TIP.
Raised objections
It has been voiced on the chat and mailing lists that this change will break
many scripts. There is no disputing that. However, the dangers and inconvenience
of the workarounds described earlier for current behavior outweigh these. In
principle, differences between handling of characters in pathnames between the
system (and C ABI) and the language should be minimized. The convenience of
translating ~
to the home directory should left to the specific application.
(As an aside, the use of |
in open
is another difference but the impact is
minimized because most modern file systems do not permit |
in paths.)
Another common objection is that this behavior is too engrained into Unix programmers. However, be it noted that this behavior is only exhibited by the Unix shell, and not even the individual utilities in Unix. Nor is it seen in the system ABI, C runtime or other commonly used scripting languages like Python, Ruby. Unix programmers do not seem to have a problem working with these so it is unclear why it would only be a problem for Tcl.
As pointed out on c.l.t., breakage is generally easy to spot and fix. To quote,
And, the ~ breakage appears in the first run of an old script in a
future v9 interpreter (file not found error) while the hidden latent
data dependent bug is just waiting to bite someday.
It is also the case that a grep through the sources will find most of the locations that need to be fixed.
An opposing view has been expressed on the core mailing list that most
occurences are not in sources but configuration files, environment variables,
user input and the like. The expansion of TCLLIBPATH
and TM
environment
variables has been added to partially mitigate this. Configuration and user
input will have to be dealt with using the file tildeexpand
command.
Implementation
The tip-602 branch contains an implementation for 9.0.
The tip-602-87 branch will contain an implementation of the new command (without removing implicit tilde expansion) for 8.7.
Change log
(In reverse chronological order)
The initialization of
auto_path
and tm paths from environment variablesTCLLIBPATH
etc. at start up will do tilde expansion.The
file home
command has been replaced with the more generalfile tildeexpand
command.
Copyright
This document has been placed in the public domain.