Source Files
- portions of tclEncoding.c
Public Interface
- struct Tcl_EncodingType
- Tcl_CreateEncoding
- Tcl_GetEncoding
- Tcl_FreeEncoding
- Tcl_GetEncodingName
- Tcl_GetEncodingNames
- Tcl_SetSystemEncoding
- Tcl_ExternalToUtfDString
- Tcl_ExternalToUtf
- Tcl_UtfToExternalDString
- Tcl_UtfToExternal
Private Interface
- TclInitEncodingSubsystem
- TclFinalizeEncodingSubsystem
Directly Depends On Public Interface
Directly Depends On Private Interface of
- None
Discussion
This is the foundational layer of encoding operations in Tcl.
The routines Tcl_GetEncoding and Tcl_GetEncodingNames are the source of dependency tangles, as they automatically route some of their function to the Encoding Loading layer, which has far more dependencies, some circular. A more conditional approach can disentangle this, notably in the init and teardown sequences.
The fundamental function here is to associate an ASCII string as the name of each encoding. The names used by Tcl are unfortunately a rather ad hoc set and don't seem attached to any standard naming system. One difficulty to resolve is the role of case in encoding names. The attachment of encoding names to encoding data file names established in Encoding Loading and the concern about filesystems that handle case in filenames in unexpected ways is at issue here, as well as just the general possibility for confusion. So, better names and clearer case rules would be good.
One set of more widely-used names may be these:
Another source to examine for reconciliation is:
The encoding/decoding functions in the extension struct Tcl_EncodingType are the guts of operations, and the interface functions determine the interfaces. Changes here will be difficult incompatibilities, but may still be worth pursuing because the existing Tcl_*To* set of functions really doesn't mesh well with the needs of actual callers. See the use of this routines by the Channel System to see the trial and error involved. Some revisions that made available interfaces that delivered what callers need would be an improvement. The most glaring defect is that there is no way to direct the Tcl_ExternalToUtf* routines to respect a limit on the number of chars produced.
Any reforms in Tcl string values generally will likely have an impact here as well.
Another audit of the refcounting scheme employed here wouldn't hurt, with particular attention to thread safety issues.
Some routines take a (Tcl_Interp *) argument, for error reporting, though they accept a value of NULL. This reduces the degree to which these routines might be used separately from the rest of Tcl.