TIP 602: Remove tilde expansion in file paths.

Login
Author:         Harald Oehlmann <[email protected]>
Author:         Ashok P. Nadkarni <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Tcl-Version:    9.0
Tcl-Branch:     tip-602
Vote-Summary:   Accepted 6/0/0
Votes-For:      AK, JN, KBK, KW, MC, SL
Votes-Against:  none
Votes-Present:  none

Abstract

Tcl 8 supports Unix shell-style tilde substitution. This TIP removes this functionality in Tcl 9.

Rationale

The Tcl 8 treatment of ~ and ~user leading components in file paths passed as arguments to file related commands is convenient for interactive use. However, the resulting behavior is insecure and error-prone.

Consider the naive attempt to clean out the /tmp directory.

cd /tmp
foreach f [glob *] {file delete -force $f}

A file ~ or ~user maliciously placed in /tmp will have rather unfortunate consequences.

In addition to being a source of security issues as above, tilde substitution is also inconvenient when writing robust file handling applications and packages. Attempting to process Mercurial repositories in Tcl for example, will generate unexpected errors.

To avoid the above pitfall, all commands that operate on files, such as open, file have to check for ~ and prefix with a ./ to disable tilde processing. On the other hand, display to the user or matching against a user supplied pattern requires the ./ be not present. Thus glob-like operations have to account for both cases.

Outside of the shells, this tilde expansion is not seen in any other commonly used languages, even on Unix. Thus programmers coming from other languages are not likely to be aware of the above pitfalls and the need for cumbersome workarounds.

Note this ambiguity in processing impacts use of utility packages as well, such as the fileutil module in tcllib, making them unusable.

Although possibly rare in the Unix world, tilde-prefixed files are not uncommon on Windows systems. Examples include

Specification

Change in file path translation

File paths will no longer be subject to tilde expansion in any commands. They will treat ~ as any other character. This includes commands that operate on files, like open, exec, glob as well as those operating on file paths, like file normalize, file tail, file basename etc.

The file pathtype command will return relative for tilde-prefixed paths.

The file split command will not prefix a tilde-prefixed path component with ./. Conversely, file join will not strip a ./ prefix from an argument starting with ./~.

There are a few exceptions where ~ expansion will take place for

New command file tildeexpand

A new file tildeexpand command is added to alleviate compatibility issues and help resolution of tilde-based paths present in configuration files etc.. The command takes the form

file tildeexpand PATH

If PATH begins with the sequence ~ or ~USER it is resolved relative to the home directory of the current user or named USER respectively. If USER is not a known user, an error is raised. If PATH does not begin with a tilde, it is returned unmodified.

In the case of ~, the command returns the value of the HOME environment variable. An error is raised if this does not exist.

In the case of ~USER, the command retrieves home directory of the user by platform-dependent means (TclpGetUserHome to be precise).

Both the above behaviors clone the 8.x resolution of tildes.

The command makes no guarantees about form of the returned path such as the separators. Other Tcl commands like file normalize etc. should be invoked on the result if that is important.

New command file home

The functionality of retrieving the home directory is exposed through the new file home command. This takes the form

file home ?USER?

If the USER argument is not specified, it returns the value of the HOME environment variable. An error is raised if this does not exist. On Windows, any backslashes in the path are converted to forward slashes.

If the USER argument is specified, it retrieves home directory of the user by platform-dependent means (TclpGetUserHome to be precise).

Both the above behaviors clone the 8.x resolution of tildes.

Retrieval of home directories can also be achieved with the file tildeexpand command so this command is not strictly necessary. However, it is more intuitive to use on platforms where the use of tilde for representing the home directory is not common.

Discussion

The comp.lang.tcl thread titled "User does not exists when file name start with ~" on 2021-05-06 had some relevant discussion.

From Don Porter

This is a much deeper issue than either that draft TIP or the posts here have
uncovered. The VFS layer has a problem not only with paths beginning with `~`,
but with all paths that have a prefix that can be claimed by a mounted
Tcl_Filesystem. The same ./ prefixing has to be applied to workaround
implications of this unfortunate design. A related matter is that prefixes and
patterns that determine [file system] assignments are not accomplished by a
registration, but by a round-robin game of hot potato. The design flaws are
large and deep. A good solution is a pretty major rewrite. This isn't a quick
fix.

Sometimes I think a good partial solution would be a rewrite that replaced all
the conditional branches that implement the `~` translation pervasively in the VFS
implementation, with a different strategy that made `~` translation available only
through a separately mounted Tcl\_Filesystem that claimed the path names
matching `~*`. In that revised strategy, more scripts and apps would have the
option of unmounting that Tcl_Filesystem to disable the feature.

Some history and additional information in ticket

<https://core.tcl-lang.org/tcl/tktview/2511011>

and probably other tickets I cannot find quickly now. 

Although Don points to a broader problem, I think the specific issue with ~ can be selectively targeted relatively simply without a major rewrite. The TIP addresses this.

There is also a wiki page dedicated to this issue:

https://wiki.tcl-lang.org/page/Tilde+Substitution

Steve Landers on the Tcler's chat suggested

On that basis I've been thinking about ways to warn people if their code relies
on ~ expansion. Something similar to what Apple do when they are in the process
of deprecating a feature. The idea isn't well developed but something like 9.0
warns if ~ found in a path with a way to turn off the warning, 9.1 doesn't warn
with a way to turn on the warning. And perhaps only warn if necessary - i.e.
expand the path and if it is different from the unexpanded then warn. But not
sure if that's practical.

Sergey Brester on the core mailing list had suggested a per-command switch. Nothing in the TIP precludes such a feature from being proposed in a separate TIP.

Raised objections

It has been voiced on the chat and mailing lists that this change will break many scripts. There is no disputing that. However, the dangers and inconvenience of the workarounds described earlier for current behavior outweigh these. In principle, differences between handling of characters in pathnames between the system (and C ABI) and the language should be minimized. The convenience of translating ~ to the home directory should left to the specific application. (As an aside, the use of | in open is another difference but the impact is minimized because most modern file systems do not permit | in paths.)

Another common objection is that this behavior is too engrained into Unix programmers. However, be it noted that this behavior is only exhibited by the Unix shell, and not even the individual utilities in Unix. Nor is it seen in the system ABI, C runtime or other commonly used scripting languages like Python, Ruby. Unix programmers do not seem to have a problem working with these so it is unclear why it would only be a problem for Tcl.

As pointed out on c.l.t., breakage is generally easy to spot and fix. To quote,

And, the ~ breakage appears in the first run of an old script in a 
future v9 interpreter (file not found error) while the hidden latent 
data dependent bug is just waiting to bite someday.

It is also the case that a grep through the sources will find most of the locations that need to be fixed.

An opposing view has been expressed on the core mailing list that most occurences are not in sources but configuration files, environment variables, user input and the like. The expansion of TCLLIBPATH and TM environment variables has been added to partially mitigate this. Configuration and user input will have to be dealt with using the file tildeexpand command.

Implementation

The tip-602 branch contains an implementation for 9.0.

The tip-602-87 branch will contain an implementation of the new command (without removing implicit tilde expansion) for 8.7.

Change log

(In reverse chronological order)

Copyright

This document has been placed in the public domain.