# TIP 575: Switchable Tcl\_UtfCharComplete()/Tcl\_UtfNext()/Tcl\_UtfPrev()
Author: Jan Nijtmans <[email protected]>
State: Final
Type: Project
Vote: Done
Tcl-Version: 8.7
Tcl-Branch: tip-575
Vote-Summary: Accepted 3/0/0
Votes-For: JN, KW, SL
Votes-Against: none
Votes-Present: none
-----
# Abstract
This TIP is a successor to [TIP #542](542.md), resolving a corner-case not realized at that moment.
[TIP #542](542.md) allows stub-enabled extensions to be compiled with `-DTCL_UTF_MAX=4`, while
still working with a Tcl compiled with `-DTCL_UTF_MAX=3`. All functions (e.g. `Tcl_UtfToUniChar`)
which use a `Tcl_UniChar`, change behavior with the value of `TCL_UTF_MAX`: If `TCL_UTF_MAX=3`,
then, supplying a 4-byte UTF-8 character to this function will return 1, and `*chPtr` will
point to a high surrogate. If `TCL_UTF_MAX=4`, then, this function will return 4, and `*chPtr`
will point to the full Unicode character. This works by supplying two different stub entries
and making a switch controlled by the value of `TCL_UTF_MAX`.
The functions `Tcl_UtfNext()/Tcl_UtfPrev()` don't have a `Tcl_UniChar` parameter, still there's
an expected coupling with the function `Tcl_UtfToUniChar`: If `TCL_UTF_MAX=4` then we would
expect `Tcl_UtfNext()` to be able to jump forward 4 bytes, while with `TCL_UTF_MAX=3`,
`Tcl_UtfNext()` can only jump forward with maximum 3 bytes. The same for `Tcl_UtfPrev()`.
The function `Tcl_UtfCharComplete()`, which is coupled with the function `Tcl_UtfToUniChar`
(indicating if there are enough bytes available for `Tcl_UtfToUniChar()` to be called),
has the same problem. Making this function switchable has the advantage that this function
now can be used to protect calls to `Tcl_UtfNext()` too, for extensions compiled with
whatever value of `TCL_UTF_MAX`.
# Specification
Implement new functions `Tcl_UtfCharComplete()`/`Tcl_UtfNext()`/`Tcl_UtfPrev()`, which can
jump 4 bytes forward resp. back, so it is possible to jump over UTF-8 characters > U+FFFF
in one step in stead of two.
The new `Tcl_UtfNext()`/`Tcl_UtfPrev()` functions will get their own new entries in the
stub table. So, extensions (however rare) using `Tcl_UtfNext()`/`Tcl_UtfPrev()` but
compiled against Tcl 8.6 headers will keep their original behavior.
The new `Tcl_UtfCharComplete()` will behave almost identical to the old one. The only
difference is when it encounters a starting byte between 0xF0 and 0xF5: Then it will return
true only when at least 4 bytes are available.
If an extension is compiled with `-DTCL_UTF_MAX=4` or with `-DTCL_NO_DEPRECATED`, then
`Tcl_UtfCharComplete()` will start behaving like described in this TIP, if not then it
will behave exactly as in Tcl 8.6.
Documentation regarding `Tcl_UtfCharComplete()` is adapted, stating that this function
can now be used to protect `Tcl_UtfNext()` calls too.
# Implementation
Implementation is in branch tip-575
# Compatibility
As long as Tcl and/or extensions are both compiled with `-DTCL_UTF_MAX=3` (which is
the default in Tcl 8.x) or `-DTCL_UTF_MAX=4` (as in Tcl 9.x), nothing changes.
The difference can only be noted in extensions which are compiled using a different
`TCL_UTF_MAX` value than Tcl.
# Copyright
This document has been placed in the public domain.