Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Explain why some TIP #389 proposed changes are upwards compatible. Remove description of Tcl_WinUtfToTChar/Tcl_WinTCharToUtf (implementation-only) change. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
40c11941eac684efde2ce97ce4a229e8 |
User & Date: | jan.nijtmans 2018-04-04 09:29:07.328 |
Context
2018-04-04
| ||
10:51 | Add a "Rejected Alternatives" section check-in: b39ae9464b user: jan.nijtmans tags: trunk | |
09:29 | Explain why some TIP #389 proposed changes are upwards compatible. Remove description of Tcl_WinUtfToTChar/Tcl_WinTCharToUtf (implementation-only) change. check-in: 40c11941ea user: jan.nijtmans tags: trunk | |
2018-03-30
| ||
17:55 | New TIP 506 check-in: c80f0474ca user: dgp tags: trunk | |
Changes
Changes to tip/389.md.
︙ | ︙ | |||
63 64 65 66 67 68 69 70 71 72 73 74 75 76 | > \* _Tcl\_UniCharToLower_ > \* _Tcl\_UniCharToTitle_ > \* _Tcl\_UniCharToUpper_ > \* _Tcl\_GetUniChar_ * Extend tclUniData.c to include all Unicode 10.0 characters up to **U\+02FA20**. A special case will be made for the functions _Tcl\_UniCharIsGraph_ and _Tcl\_UniCharIsPrint_ for the characters in the range **U\+0E0100** - **U\+0E01EF**, otherwise it would almost double the Unicode table size. | > > > > > > > > > > > > > | 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | > \* _Tcl\_UniCharToLower_ > \* _Tcl\_UniCharToTitle_ > \* _Tcl\_UniCharToUpper_ > \* _Tcl\_GetUniChar_ At first sight, this looks like a binary incompatibility, but in fact this is upwards compatible. Since in C, function calls generally transfer the result of a function call in a special register (the Accumulator). When compiling an extension using Tcl 8.6 headers, the caller expects the accumulator to contain a 16-bit result, while the remaining 48 bits (the Accumulator generally is 64-bit) are undefined. When the extension is run under Tcl 8.7, 16 more bits of the accumulator content are now defined (generally all zero's). The effect is that all characters >= **U\+010000** (which are not supported on Tcl 8.6) are now mapped to characters in the first unicode plane, but that's all. Re-compiling the extension using Tcl 8.7 headers might enable full Unicode support for the extension, if a 32-bit register is used to store the result. * Extend tclUniData.c to include all Unicode 10.0 characters up to **U\+02FA20**. A special case will be made for the functions _Tcl\_UniCharIsGraph_ and _Tcl\_UniCharIsPrint_ for the characters in the range **U\+0E0100** - **U\+0E01EF**, otherwise it would almost double the Unicode table size. |
︙ | ︙ | |||
85 86 87 88 89 90 91 | * If Tcl is compiled with -DTCL\_UTF\_MAX=6, use a different TCL\_STUB\_MAGIC value. Since extensions compiled with -DTCL\_UTF\_MAX=6 are binary incompatible with normally-compiled Tcl, this causes extensions compiled with this same options no longer being loadable in normal Tcl and reverse. Note that TCL\_UTF\_MAX=6 compiles are still not officially supported, a lot of additional fixes are needed to make it work right. | < < < < < < < | 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | * If Tcl is compiled with -DTCL\_UTF\_MAX=6, use a different TCL\_STUB\_MAGIC value. Since extensions compiled with -DTCL\_UTF\_MAX=6 are binary incompatible with normally-compiled Tcl, this causes extensions compiled with this same options no longer being loadable in normal Tcl and reverse. Note that TCL\_UTF\_MAX=6 compiles are still not officially supported, a lot of additional fixes are needed to make it work right. # Compatibility As long as no Surrogates or characters >= **U\+010000** are used, all functions behave exactly the same as before. The only way that _Tcl\_UniCharToUtf_ can produce a 4-byte output is when Surrogates or characters >= **U\+010000** are used. |
︙ | ︙ | |||
151 152 153 154 155 156 157 | 0 -> (So we cannot access the lower surrogate separately) So, the "string length" of a Unicode character >U+FFFF is 2, and if you try to split it in two separate characters that won't work: It will then be split in a character with length 2 (the original one) and another character with length 0 (the empty string). | | | > | 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | 0 -> (So we cannot access the lower surrogate separately) So, the "string length" of a Unicode character >U+FFFF is 2, and if you try to split it in two separate characters that won't work: It will then be split in a character with length 2 (the original one) and another character with length 0 (the empty string). Also note that the regexp engine still cannot really handle Unicode characters >U+FFFF, it will handle those as if they consist of 2 separate characters. Most usage of regular expressions won't notice the difference. Those caveats are planned to be handled in "part 2" (TIP #497) # Reference Implementation A reference implementation is available in the [tip-389 branch] (https://core.tcl.tk/tk/timeline?r=tip-389). # Copyright This document has been placed in the public domain. |