Tcl Source Code

Encoding Loading
Login

Source Files

Public Interface

Private Interface

Directly Depends On Public Interface

Directly Depends On Private Interface of

Discussion

This module includes the code that goes out and searches the encoding dirs search path to find *.enc files that holds data to define additional encodings.

The circularity problem is that both querying a file system and reading data in from a file potentially make use of encodings, and might run aground if they need an encoding that's not already available.

The way out of this dilemma comes in several parts, some already in place. First, we make sure that handling of the utf-8 encoding is built-in to Tcl, with no need to bring data in from a file first. That's the foundation of the bootstrap. Then we define that all *.enc files must be stored in the utf-8 encoding. This relieves all concern about the channel read operation.

We still have to query the filesystem. Here is where Tcl's virtual filesystem architecture holds the key. Each Tcl_Filesystem can define a Tcl_FSCreateInternalRepProc to generate the data format it needs (and may cache) to interact with whatever backing system it is using to implement the filesystem. In the current implementation, the unix native filesystem uses the routine TclNativeCreateNativeRep to perform this function, and it in turn calls Tcl_UtfToExternalDString to apply the system encoding to get something suitable for passing to TclOSOpen. That closes the cycle. (Note that Windows doesn't put in place any similar cycle. It makes no call back into the Encoding System. Maybe once it did, but no more.)

So long as Tcl believes the system encoding is utf-8 there's no problem, and that's a common environment on unix systems. But we are vulnerable to trouble with other values for the system encoding. The solution requires breaking out of this mistaken concept of a system encoding. There's no such thing. At this point, we do not care what encoding the system uses, whatever that means. We specifically want to know how to convert Tcl path values into values this native filesystem can understand. That's a property of each Tcl_Filesystem, not the system in any more general sense. So the proper placement of the burden is in each Tcl_Filesystem, and the Tcl_FSCreateInternalRepProc. It must supply for itself any encoding knowledge. If it wishes to use the Encoding System facilities, that's fine, but for any non-foundational encoding, it will have to be able to supply the encoding(s) it needs with Tcl_CreateEncoding calls.

TclGetLibraryPath appears to exist only for internal stubs compat. Otherwise it could be a file static routine.

Questions

Is the format of *.enc files documented ? It currently seems to be implied in the sources of "lib/encoding/txt2enc.c", and "generic/tclEncoding.c" (Load{Table|Escape}Encoding)

As far as I know, there is no documentation of the encoding data file formats in any of the places where 'official' documentation is found. Furthermore, I'm not aware of any project using Tcl that has extended the set of encodings supported by Tcl via creating their own custom foo.enc file. Many interesting things can and do happen outside of public view, though. New encodings have appeared in the set distributed with Tcl from time to time. Most recent such change seems to be more than 11 years ago, with gb2312-raw.