602.md at [06eafc16d8]

Login

File tip/602.md artifact bf0326b436 part of check-in 06eafc16d8


# TIP 602: Remove tilde expansion in file paths.
	Author:         Harald Oehlmann <[email protected]>
	Author:         Ashok P. Nadkarni <[email protected]>
	State:          Final
	Type:           Project
	Vote:           Done
	Tcl-Version:    9.0
	Tcl-Branch:     tip-602
	Vote-Summary:   Accepted 6/0/0
	Votes-For:      AK, JN, KBK, KW, MC, SL
	Votes-Against:  none
	Votes-Present:  none
-----
<!-- TOC BEGIN (auto generated with tiptoc) -->
* <a href='#Abstract'>Abstract</a>
* <a href='#Rationale'>Rationale</a>
* <a href='#Specification'>Specification</a>
    * <a href='#Changeinfilepathtranslation'>Change in file path translation</a>
    * <a href='#Newcommandfiletildeexpand'>New command `file tildeexpand`</a>
    * <a href='#Newcommandfilehome'>New command `file home`</a>
* <a href='#Discussion'>Discussion</a>
* <a href='#Raisedobjections'>Raised objections</a>
* <a href='#Implementation'>Implementation</a>
* <a href='#Changelog'>Change log</a>
* <a href='#Copyright'>Copyright</a>

<!-- TOC END -->


# <a id='Abstract'></a>Abstract

Tcl 8 supports Unix shell-style tilde substitution. This TIP
removes this functionality in Tcl 9.

# <a id='Rationale'></a>Rationale

The Tcl 8 treatment of `~` and `~user` leading components in file paths
passed as arguments to file related commands is convenient for interactive
use. However, the resulting behavior is insecure and error-prone.

Consider the naive attempt to clean out the `/tmp` directory.

```
cd /tmp
foreach f [glob *] {file delete -force $f}
```

A file `~` or `~user` maliciously placed in `/tmp` will have rather
unfortunate consequences.

In addition to being a source of security issues as above, tilde substitution is
also inconvenient when writing robust file handling applications and packages.
Attempting to process Mercurial repositories in Tcl for example, will generate
unexpected errors.

To avoid the above pitfall, all commands that operate on files, such as `open`,
`file` have to check for `~` and prefix with a `./` to disable tilde processing.
On the other hand, display to the user or matching against a user supplied
pattern requires the `./` be not present. Thus glob-like operations have
to account for both cases.

Outside of the shells, this tilde expansion is not seen in any other
commonly used languages, even on Unix. Thus programmers coming from other
languages are not likely to be aware of the above pitfalls and the need
for cumbersome workarounds.

Note this ambiguity in processing impacts use of utility packages as well, such
as the `fileutil` module in `tcllib`, making them unusable.

Although possibly rare in the Unix world, tilde-prefixed files are not uncommon
on Windows systems. Examples include

- Files within Mercurial SCM repository storage (perhaps even present on Unix)
- Files created by Excel prefixed as `~$`
- Directory under the Visual Studio installation of the form `~FC`, `~IC` etc.
- Files in the %TEMP% directory, not clear what application creates these.
- Font caches under AppData

# <a id='Specification'></a>Specification

## <a id='Changeinfilepathtranslation'></a>Change in file path translation

File paths will no longer be subject to tilde expansion in any commands.
They will treat `~` as any other character. This includes commands that
operate on files, like `open`, `exec`, `glob` as well as those operating
on file paths, like `file normalize`, `file tail`, `file basename` etc.

The `file pathtype` command will return `relative` for tilde-prefixed paths.

The `file split` command will not prefix a tilde-prefixed path component with
`./`. Conversely, `file join` will not strip a `./` prefix from an argument
starting with `./~`.

There are a few exceptions where `~` expansion will take place for 

- Initialization of `auto_path` from the `TCLLIBPATH` environment variable
will do tilde expansion on each path. Any expansion that fails because
the user is unknown will not be included in `auto_path`.

- Likewise, initialization of the Tcl module search paths from the
`TCL9_0_TM_PATH` (and similar) environment variables will undergo tilde
expansion. Again, any expansions that fail because of unknown user names
will be excluded. Note that the commands `tcl::tm::add` and `tcl::tm::roots`
will not themselves do any tilde expansion.

- The initialization of the `tcl_pkgPath` variable will undergo tilde expansion
at start up time. This is necessitated by the MacOS configure's use to set
`TCL_PACKAGE_PATH` at build time.

## <a id='Newcommandfiletildeexpand'></a>New command `file tildeexpand`

A new `file tildeexpand` command is added to alleviate compatibility issues and
help resolution of tilde-based paths present in configuration files etc..
The command takes the form

```
file tildeexpand PATH
```

If `PATH` begins with the sequence `~` or `~USER` it is resolved relative to
the home directory of the current user or named `USER` respectively. If
`USER` is not a known user, an error is raised. If `PATH` does not begin with
a tilde, it is returned unmodified.

In the case of `~`, the command returns the value of the `HOME` environment
variable. An error is raised if this does not exist.

In the case of `~USER`, the command retrieves home directory of
the user by platform-dependent means (`TclpGetUserHome` to be precise).

Both the above behaviors clone the 8.x resolution of tildes.

The command makes no guarantees about form of the returned path such
as the separators. Other Tcl commands like `file normalize` etc. should be
invoked on the result if that is important.

## <a id='Newcommandfilehome'></a>New command `file home`

The functionality of retrieving the home directory is exposed through
the new `file home` command. This takes the form

```
file home ?USER?
```

If the `USER` argument is not specified, it returns the value of the
`HOME` environment variable. An error is raised if this does not exist.
On Windows, any backslashes in the path are converted to forward slashes.

If the `USER` argument is specified, it retrieves home directory of
the user by platform-dependent means (`TclpGetUserHome` to be precise).

Both the above behaviors clone the 8.x resolution of tildes.

Retrieval of home directories can also be achieved with the 
`file tildeexpand` command so this command is not strictly necessary.
However, it is more intuitive to use on platforms where the use of tilde
for representing the home directory is not common.

# <a id='Discussion'></a>Discussion

The comp.lang.tcl thread titled "User does not exists when file name start
with ~" on 2021-05-06 had some relevant discussion. 

*From Don Porter*

```
This is a much deeper issue than either that draft TIP or the posts here have
uncovered. The VFS layer has a problem not only with paths beginning with `~`,
but with all paths that have a prefix that can be claimed by a mounted
Tcl_Filesystem. The same ./ prefixing has to be applied to workaround
implications of this unfortunate design. A related matter is that prefixes and
patterns that determine [file system] assignments are not accomplished by a
registration, but by a round-robin game of hot potato. The design flaws are
large and deep. A good solution is a pretty major rewrite. This isn't a quick
fix.

Sometimes I think a good partial solution would be a rewrite that replaced all
the conditional branches that implement the `~` translation pervasively in the VFS
implementation, with a different strategy that made `~` translation available only
through a separately mounted Tcl\_Filesystem that claimed the path names
matching `~*`. In that revised strategy, more scripts and apps would have the
option of unmounting that Tcl_Filesystem to disable the feature.

Some history and additional information in ticket

<https://core.tcl-lang.org/tcl/tktview/2511011>

and probably other tickets I cannot find quickly now. 
```

Although Don points to a broader problem, I think the specific issue with `~`
can be selectively targeted relatively simply without a major rewrite. The
TIP addresses this.

There is also a wiki page dedicated to this issue:

<https://wiki.tcl-lang.org/page/Tilde+Substitution>


Steve Landers on the Tcler's chat suggested

```
On that basis I've been thinking about ways to warn people if their code relies
on ~ expansion. Something similar to what Apple do when they are in the process
of deprecating a feature. The idea isn't well developed but something like 9.0
warns if ~ found in a path with a way to turn off the warning, 9.1 doesn't warn
with a way to turn on the warning. And perhaps only warn if necessary - i.e.
expand the path and if it is different from the unexpanded then warn. But not
sure if that's practical.
```

Sergey Brester on the core mailing list had suggested a per-command switch.
Nothing in the TIP precludes such a feature from being proposed in a 
separate TIP.

# <a id='Raisedobjections'></a>Raised objections

It has been voiced on the chat and mailing lists that this change will break
many scripts. There is no disputing that. However, the dangers and inconvenience
of the workarounds described earlier for current behavior outweigh these. In
principle, differences between handling of characters in pathnames between the
system (and C ABI) and the language should be minimized. The convenience of
translating `~` to the home directory should left to the specific application.
(As an aside, the use of `|` in `open` is another difference but the impact is
minimized because most modern file systems do not permit `|` in paths.)

Another common objection is that this behavior is too engrained into Unix
programmers. However, be it noted that this behavior is only exhibited by the
Unix shell, and not even the individual utilities in Unix. Nor is it seen in the
system ABI, C runtime or other commonly used scripting languages like Python,
Ruby. Unix programmers do not seem to have a problem working with these so it is
unclear why it would only be a problem for Tcl.

As pointed out on c.l.t., breakage is generally easy to spot and fix. To quote,

```
And, the ~ breakage appears in the first run of an old script in a 
future v9 interpreter (file not found error) while the hidden latent 
data dependent bug is just waiting to bite someday.
```

It is also the case that a grep through the sources will find most of the
locations that need to be fixed.

An opposing view has been expressed on the core mailing list that most
occurences are not in sources but configuration files, environment variables,
user input and the like. The expansion of `TCLLIBPATH` and `TM` environment
variables has been added to partially mitigate this. Configuration and user
input will have to be dealt with using the `file tildeexpand` command.

# <a id='Implementation'></a>Implementation

The tip-602 branch contains an implementation for 9.0.

The tip-602-87 branch will contain an implementation of the new command
(without removing implicit tilde expansion) for 8.7.

# <a id='Changelog'></a>Change log

(In reverse chronological order)

- The initialization of `auto_path` and tm paths from environment variables
`TCLLIBPATH` etc. at start up will do tilde expansion.

- The `file home` command has been replaced with the more general
`file tildeexpand` command.

# <a id='Copyright'></a>Copyright

This document has been placed in the public domain.