TIP 667: Make "strict" the default encoding profile.

Login
Author:         Nathan Coulter <[email protected]>
State:          Draft
Type:           Project
Vote:           Pending
Tcl-Version:    9.0
Tcl-Branch:     tip-667
Obsoleted-By:   657

Abstract

Although TIP #657 purported to be about making the strict profile the default, it also specified other things that were out of scope, specified unnecessary implementation details, and included a partial alternative to TIP #653 in its Compatibility section (those changes have since been incorporated into TIP #653. This TIP proposes that "strict" become the default encoding profile for all operations.

Rationale

The tcl8 profile was until recently the only option for handling encoding errors in channel content. Now there are two additional profiles available, strict and replace.

The most common use case for encoded data is to expect that if the operation completed without error, the data were correctly encoded and that no data were lost in the result. This corresponds to the strict encoding profile, so it makes sense to make this profile the default. Where it is not the default, data may be silently corrupted, with the corruption being discovered only at some later date after collateral damage, possibly including exploitation by bad actors, has been discovered.

It is expected that scripts that must be adapted due to this change in default behaviour will fail early and before real damage is done, making it easy to detect where change is necessary and leading to a more secure and correct scripting environment overall. Functions like fcopy, read and gets throw exceptions as soon as bad data is detected. Where this is not desired it is easy to remedy through trivial mechanical changes to existing scripts.

Specification

New channels are by default assigned the strict profile, and both encoding convertfrom and encoding convertto use the strict profile by default.

Tcl_FSEvalFileEx() uses the strict profile, and therefore source uses the strict profile. The http package leaves any channels it opens in their default strict configuration, so it too uses the strict profile.

Tcl_ExternalToUtfDStringEx(), Tcl_UtfToExternalDStringEx(), Tcl_ExternalToUtf() and Tcl_UtfToExternal(), support operation in a mode where any encoding error that occurs results in an EILSEQ POSIX error. That mode is now the default. Other modes can be explicitly configured by the caller to specify how these functions behave when invalid data are encountered.

Any test that in the Tcl test suite that requires a channel that is not configured for strict encoding explicitly configures the channel according to its needs.

Further explanation

Compatibility

This is an incompatible change for Tcl_ExternalToUtf()/Tcl_UtfToExternal(), but since those functions are often called to operate in strict mode, it will have little effect.

This is an incompatible change for Tcl_Read(), Tcl_Write(), Tcl_Gets(). See TIP 653 for details.

Implementation

The branch trunk-encodingdefaultstrict implements this TIP.

Copyright

This document has been placed in the public domain.