TIP 670: Simple Extra Procedures for File Access

Login
Author:		Donal Fellows <[email protected]>
State:		Final
Type:		Project
Vote:		Done
Created:	29-05-2023
Tcl-Version:	8.7
Tcl-Branch:	tip-670

Abstract

This TIP adds some simple procedures for common types of basic file access.

Outline

I've been doing user support for a while and there are a few basic questions that come up relatively frequently, such as "How do I read a file?" and "How do I write a file?" Naturally, there common patterns for this. Let's make a few of these be procedures in Tcl itself (much like parray for printing arrays). These will be explicitly procedures and they will be made available for autoloading so they will be less likely to conflict with existing user code.

The procedures will be readFile, writeFile and foreachLine.

Part of the purpose of these procedures is to act as samples of how to do these sorts of tasks. They are not intended to handle all edge cases. They are intended to be read by users of Tcl.

Description of Procedures

readFile

The readFile procedure will take one or two arguments. The first argument shall be the name of an existing file, and the second argument, if provided shall be either text or binary (or any unambiguous prefix thereof) to indicate which mode to read the file in. The default mode shall be text.

In text mode, the successful result of readFile shall be the whole contents of the file (up to whatever default EOF character Tcl uses), assuming it is using the system encoding, including any trailing newlines. CRLF sequences and so on will be normalized.

In binary mode, the successful result of readFile shall be the whole exact byte content of the file without further interpretation.

Text mode will be much like this:

proc readFile {filename} {
    set f [open $filename "r"]
    set data [read $f]
    close $f
    return $data
}

Binary mode will be much like this:

proc readFile {filename binary} {
    set f [open $filename "rb"]
    set data [read $f]
    close $f
    return $data
}

The actual implementation will be more complex because of argument parsing and ensuring that resources are managed correctly.

proc readFile {filename {mode text}} {
    # Parse the arguments
    set MODES {binary text}
    set ERR [list -level 1 -errorcode [list TCL LOOKUP MODE $mode]]
    set mode [tcl::prefix match -message "mode" -error $ERR $MODES $mode]

    # Read the file
    set f [open $filename [dict get {text r binary rb} $mode]]
    try {
        return [read $f]
    } finally {
        close $f
    }
}

writeFile

The writeFile procedure will take two or three arguments. The first argument shall be the name of a file to write. The final argument shall be the contents to write to the file (overwriting any existing file with that name; the code will not be careful in this regard). If there are three arguments, the second argument will be either text or binary (or any unambiguous prefix thereof) to indicate the mode of operation. The default mode shall be text. The successful result of writeFile shall be the empty string.

In text mode, the file will have the text written to it as provided, without extra newlines (that is, we will uphold the expectation that using writeFile with the result of readFile shall produce an equivalent file). The platform default encoding and line mode shall be used. It is up to the caller to provide any trailing newline if one is desired.

In binary mode, the file will have the binary data written to it exactly as provided.

Text mode will be much like this:

proc writeFile {filename data} {
    set f [open $filename "w"]
    puts -nonewline $f $data
    close $f
}

Binary mode will be much like this:

proc writeFile {filename binary data} {
    set f [open $filename "wb"]
    puts -nonewline $f $data
    close $f
}

The actual implementation will be more complex because of argument parsing and ensuring that resources are managed correctly.

proc writeFile {args} {
    # Parse the arguments
    switch [llength $args] {
        2 {
            lassign $args filename data
            set mode text
        }
        3 {
            lassign $args filename mode data
            set MODES {binary text}
            set ERR [list -level 1 -errorcode [list TCL LOOKUP MODE $mode]]
            set mode [tcl::prefix match -message "mode" -error $ERR $MODES $mode]
        }
        default {
            set COMMAND [lindex [info level 0] 0]
            return -code error -errorcode {TCL WRONGARGS} \
                "wrong # args: should be \"$COMMAND filename ?mode? data\""
        }
    }

    # Write the file
    set f [open $filename [dict get {text w binary wb} $mode]]
    try {
        puts -nonewline $f $data
    } finally {
        close $f
    }
}

foreachLine

The foreachLine procedure will take three arguments. The first shall be the name of a variable in the calling scope, the second shall be the name of a text file to read, and the third shall be a string containing Tcl code (that will be evaluated in the caller's context). The platform default encoding will be used.

A simplified version of the procedure will be as below:

proc foreachLine {varName filename body} {
    upvar 1 $varName line
    set f [open $filename "r"]
    while {[gets $f line] >= 0} {
        uplevel 1 $body
    }
    close $f
}

The full version will be more complex in order to do correct handling. In particular, errors will leave the file closed, and return, break and continue will be supported (with normal semantics) within the body.

The more complete version is this:

proc foreachLine {varName filename body} {
    upvar 1 $varName line
    set f [open $filename "r"]
    try {
        while {[gets $f line] >= 0} {
            uplevel 1 $body
        }
    } on return {msg opt} {
        dict incr opt -level
        return -options $opt $msg
    } finally {
        close $f
    }
}

The extra complexity with try handles getting return correct and ensures that we always close the channel.

Another part of the purpose of foreachLine is to provide an example of how to do the "I'm making a custom looping construct in Tcl code" pattern. Having such things is good, very good, and part of what we encourage users to write (because they understand what the things are to iterate over, and we'll never manage to make all the base iterators in the core) but needs slight tweaking to make work perfectly; the foreachLine source shows how to do that including handling the tricky edge case that is return with a level specified.

TIPs #636 and #644 may cause how the inner loop of foreachLine is implemented. That's fine with me, but doesn't seem to be particularly necessary here; the code will not get shorter.

Explicitly omitted are things like filtering of the lines to remove, say, empty lines and comment lines; that's the sort of thing I put in my own file iterators, but they're not general enough. And they can be done easily in user code:

foreachLine line filename.txt {
    set line [string trim $line]
    if {$line eq "" || [string match #* $line]} {
        continue
    }
    # Example of normalization
    processWordsOfLine [split [regsub -all {\s+} $line " "]]
}

Copyright

This document has been placed in the public domain.