Author: Jan Nijtmans <[email protected]> Author: Jan Nijtmans <[email protected]> Author: Don Porter <[email protected]> State: Draft Type: Project Vote: Pending Created: 23-Jan-2018 Post-History: Discussions-To: Tcl Core list Keywords: Tcl Tcl-Version: 9.0 Tcl-Branch: tip-497
This TIP proposes to add full support for all characters in Unicode 10.0+, inclusive the characters >= U+010000, even the adaptation in the regexp engine. Also, the caveats remaining from TIP #389 will be handled here.
This document proposes:
Add a new objType "UTF-32", which is able to store a string in 32-bits per character.
Adapt the regexp engine to start using the "UTF-32" objType: Any string handled by regexp will first be converted to "UTF-32". (DONE in tip-497 branch)
Modify all API using Tcl_UniChar: If the string contains surrogate pairs, the "UTF-32" objType will used.
Modify all functions using or producing an index: "string length
" should return 1 for all Unicode characters, even the ones >= U+010000
TODO: everything else that comes up
A reference implementation is ongoing in the tip-497 branch. https://core.tcl.tk/tcl/timeline?r=tip-497
This document has been placed in the public domain.