Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Merge 8.6 |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | core-8-branch |
Files: | files | file ages | folders |
SHA3-256: |
468df9021d6d07efe83de59ccce96215 |
User & Date: | jan.nijtmans 2019-12-02 20:48:35.780 |
Context
2019-12-02
| ||
22:07 | merge 8.6 check-in: cc5377327f user: pooryorick tags: core-8-branch | |
20:51 | Merge 8.7 check-in: c52a58fe02 user: jan.nijtmans tags: trunk | |
20:48 | Merge 8.6 check-in: 468df9021d user: jan.nijtmans tags: core-8-branch | |
20:26 | If TCL_UTF_MAX>=4, make Tcl_ParseBackslash combine two surrogates so they appear as one 4-byte UTF-8... check-in: 43032d7ba3 user: jan.nijtmans tags: core-8-6-branch | |
18:15 | Bump to 8.7a4 to distinguish development from 8.7a3 release. check-in: ff95f2c49c user: dgp tags: core-8-branch | |
Changes
Changes to generic/tclParse.c.
︙ | ︙ | |||
862 863 864 865 866 867 868 869 870 871 872 873 874 875 | case 'u': count += TclParseHex(p+1, (numBytes > 5) ? 4 : numBytes-2, &result); if (count == 2) { /* * No hexdigits -> This is just "u". */ result = 'u'; } break; case 'U': count += TclParseHex(p+1, (numBytes > 9) ? 8 : numBytes-2, &result); if (count == 2) { /* * No hexdigits -> This is just "U". | > > > > > > > > | 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 | case 'u': count += TclParseHex(p+1, (numBytes > 5) ? 4 : numBytes-2, &result); if (count == 2) { /* * No hexdigits -> This is just "u". */ result = 'u'; } else if (((result & 0xDC00) == 0xD800) && (count == 6) && (p[5] == '\\') && (p[6] == 'u') && (numBytes >= 10)) { /* If high surrogate is immediately followed by a low surrogate escape, combine them. */ int low; int count2 = TclParseHex(p+7, 4, &low); if ((low & 0xDC00) == 0xDC00) { result = ((result & 0x3FF)<<10 | (low & 0x3FF)) + 0x10000; count += count2 + 2; } } break; case 'U': count += TclParseHex(p+1, (numBytes > 9) ? 8 : numBytes-2, &result); if (count == 2) { /* * No hexdigits -> This is just "U". |
︙ | ︙ |
Changes to generic/tclUtf.c.
︙ | ︙ | |||
1063 1064 1065 1066 1067 1068 1069 | result = TclParseBackslash(src, LINE_LENGTH, &numRead, dst); if (numRead == LINE_LENGTH) { /* * We ate a whole line. Pay the price of a strlen() */ | | | 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 | result = TclParseBackslash(src, LINE_LENGTH, &numRead, dst); if (numRead == LINE_LENGTH) { /* * We ate a whole line. Pay the price of a strlen() */ result = TclParseBackslash(src, strlen(src), &numRead, dst); } if (readPtr != NULL) { *readPtr = numRead; } return result; } |
︙ | ︙ |
Changes to tests/utf.test.
︙ | ︙ | |||
23 24 25 26 27 28 29 | # Some tests require support for 4-byte UTF-8 sequences testConstraint tip389 [expr {[string length \U010000] == 2}] test utf-1.1 {Tcl_UniCharToUtf: 1 byte sequences} testbytestring { expr {"\x01" eq [testbytestring "\x01"]} } 1 test utf-1.2 {Tcl_UniCharToUtf: 2 byte sequences} testbytestring { | | | | | | | | | | | > > > | | | 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | # Some tests require support for 4-byte UTF-8 sequences testConstraint tip389 [expr {[string length \U010000] == 2}] test utf-1.1 {Tcl_UniCharToUtf: 1 byte sequences} testbytestring { expr {"\x01" eq [testbytestring "\x01"]} } 1 test utf-1.2 {Tcl_UniCharToUtf: 2 byte sequences} testbytestring { expr {"\x00" eq [testbytestring "\xC0\x80"]} } 1 test utf-1.3 {Tcl_UniCharToUtf: 2 byte sequences} testbytestring { expr {"\xE0" eq [testbytestring "\xC3\xA0"]} } 1 test utf-1.4 {Tcl_UniCharToUtf: 3 byte sequences} testbytestring { expr {"\u4E4E" eq [testbytestring "\xE4\xB9\x8E"]} } 1 test utf-1.5 {Tcl_UniCharToUtf: overflowed Tcl_UniChar} testbytestring { expr {[format %c 0x110000] eq [testbytestring "\xEF\xBF\xBD"]} } 1 test utf-1.6 {Tcl_UniCharToUtf: negative Tcl_UniChar} testbytestring { expr {[format %c -1] eq [testbytestring "\xEF\xBF\xBD"]} } 1 test utf-1.7 {Tcl_UniCharToUtf: 4 byte sequences} -constraints testbytestring -body { expr {"\U014E4E" eq [testbytestring "\xF0\x94\xB9\x8E"]} } -result 1 test utf-1.8 {Tcl_UniCharToUtf: 3 byte sequence, high surrogate} testbytestring { expr {"\uD842" eq [testbytestring "\xED\xA1\x82"]} } 1 test utf-1.9 {Tcl_UniCharToUtf: 3 byte sequence, low surrogate} testbytestring { expr {"\uDC42" eq [testbytestring "\xED\xB1\x82"]} } 1 test utf-1.10 {Tcl_UniCharToUtf: 3 byte sequence, high surrogate} testbytestring { expr {[format %c 0xD842] eq [testbytestring "\xED\xA1\x82"]} } 1 test utf-1.11 {Tcl_UniCharToUtf: 3 byte sequence, low surrogate} testbytestring { expr {[format %c 0xDC42] eq [testbytestring "\xED\xB1\x82"]} } 1 test utf-1.12 {Tcl_UniCharToUtf: 4 byte sequence, high/low surrogate} testbytestring { expr {"\uD842\uDC42" eq [testbytestring "\xF0\xA0\xA1\x82"]} } 1 test utf-2.1 {Tcl_UtfToUniChar: low ascii} { string length "abc" } {3} test utf-2.2 {Tcl_UtfToUniChar: naked trail bytes} testbytestring { string length [testbytestring "\x82\x83\x84"] } {3} test utf-2.3 {Tcl_UtfToUniChar: lead (2-byte) followed by non-trail} testbytestring { string length [testbytestring "\xC2"] } {1} test utf-2.4 {Tcl_UtfToUniChar: lead (2-byte) followed by trail} testbytestring { string length [testbytestring "\xC2\xA2"] } {1} test utf-2.5 {Tcl_UtfToUniChar: lead (3-byte) followed by non-trail} testbytestring { string length [testbytestring "\xE2"] } {1} test utf-2.6 {Tcl_UtfToUniChar: lead (3-byte) followed by 1 trail} testbytestring { string length [testbytestring "\xE2\xA2"] } {2} test utf-2.7 {Tcl_UtfToUniChar: lead (3-byte) followed by 2 trail} testbytestring { string length [testbytestring "\xE4\xB9\x8E"] } {1} test utf-2.8 {Tcl_UtfToUniChar: lead (4-byte) followed by 3 trail} -constraints {tip389 testbytestring} -body { string length [testbytestring "\xF0\x90\x80\x80"] } -result {2} test utf-2.9 {Tcl_UtfToUniChar: lead (4-byte) followed by 3 trail} -constraints {tip389 testbytestring} -body { string length [testbytestring "\xF4\x8F\xBF\xBF"] } -result {2} |
︙ | ︙ | |||
146 147 148 149 150 151 152 | test utf-7.1 {Tcl_UtfPrev} { } {} test utf-8.1 {Tcl_UniCharAtIndex: index = 0} { string index abcd 0 } {a} test utf-8.2 {Tcl_UniCharAtIndex: index = 0} { | | | | | | | | | | | 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | test utf-7.1 {Tcl_UtfPrev} { } {} test utf-8.1 {Tcl_UniCharAtIndex: index = 0} { string index abcd 0 } {a} test utf-8.2 {Tcl_UniCharAtIndex: index = 0} { string index \u4E4E\u25A 0 } "\u4E4E" test utf-8.3 {Tcl_UniCharAtIndex: index > 0} { string index abcd 2 } {c} test utf-8.4 {Tcl_UniCharAtIndex: index > 0} { string index \u4E4E\u25A\xFF\u543 2 } "\uff" test utf-8.5 {Tcl_UniCharAtIndex: high surrogate} { string index \ud842 0 } "\ud842" test utf-8.6 {Tcl_UniCharAtIndex: low surrogate} { string index \udc42 0 } "\udc42" test utf-9.1 {Tcl_UtfAtIndex: index = 0} { string range abcd 0 2 } {abc} test utf-9.2 {Tcl_UtfAtIndex: index > 0} { string range \u4E4E\u25A\xFF\u543klmnop 1 5 } "\u25A\xFF\u543kl" test utf-10.1 {Tcl_UtfBackslash: dst == NULL} { set x \n } { } test utf-10.2 {Tcl_UtfBackslash: \u subst} testbytestring { expr {"\uA2" eq [testbytestring "\xC2\xA2"]} } 1 test utf-10.3 {Tcl_UtfBackslash: longer \u subst} testbytestring { expr {"\u4E21" eq [testbytestring "\xE4\xB8\xA1"]} } 1 test utf-10.4 {Tcl_UtfBackslash: stops at first non-hex} testbytestring { expr {"\u4E2k" eq "[testbytestring \xD3\xA2]k"} } 1 test utf-10.5 {Tcl_UtfBackslash: stops after 4 hex chars} testbytestring { expr {"\u4E216" eq "[testbytestring \xE4\xB8\xA1]6"} } 1 test utf-10.6 {Tcl_UtfBackslash: stops after 5 hex chars} testbytestring { expr {"\U1e2165" eq "[testbytestring \xf0\x9e\x88\x96]5"} } 1 test utf-10.7 {Tcl_UtfBackslash: stops after 6 hex chars} testbytestring { expr {"\U10e2165" eq "[testbytestring \xf4\x8e\x88\x96]5"} } 1 |
︙ | ︙ | |||
235 236 237 238 239 240 241 | bsCheck \x541 84 bsCheck \u 117 bsCheck \uk 117 bsCheck \u41 65 bsCheck \ua 10 bsCheck \uA 10 bsCheck \340 224 | | | | | | | | | | | | | | | | | | | | | | | | | | | | 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | bsCheck \x541 84 bsCheck \u 117 bsCheck \uk 117 bsCheck \u41 65 bsCheck \ua 10 bsCheck \uA 10 bsCheck \340 224 bsCheck \uA1 161 bsCheck \u4E21 20001 bsCheck \741 60 bsCheck \U 85 bsCheck \Uk 85 bsCheck \U41 65 bsCheck \Ua 10 bsCheck \UA 10 bsCheck \Ua1 161 bsCheck \U4E21 20001 bsCheck \U004E21 20001 bsCheck \U00004E21 20001 bsCheck \U0000004E21 78 bsCheck \U00110000 69632 bsCheck \U01100000 69632 bsCheck \U11000000 69632 bsCheck \U0010FFFF 1114111 bsCheck \U010FFFF0 1114111 bsCheck \U10FFFF00 1114111 bsCheck \UFFFFFFFF 1048575 test utf-11.1 {Tcl_UtfToUpper} { string toupper {} } {} test utf-11.2 {Tcl_UtfToUpper} { string toupper abc } ABC test utf-11.3 {Tcl_UtfToUpper} { string toupper \u00E3ab } \u00C3AB test utf-11.4 {Tcl_UtfToUpper} { string toupper \u01E3ab } \u01E2AB test utf-11.5 {Tcl_UtfToUpper Georgian (new in Unicode 11)} { string toupper \u10D0\u1C90 } \u1C90\u1C90 test utf-11.6 {Tcl_UtfToUpper low/high surrogate)} { string toupper \udc24\ud824 } \udc24\ud824 test utf-12.1 {Tcl_UtfToLower} { string tolower {} } {} test utf-12.2 {Tcl_UtfToLower} { string tolower ABC } abc test utf-12.3 {Tcl_UtfToLower} { string tolower \u00C3AB } \u00E3ab test utf-12.4 {Tcl_UtfToLower} { string tolower \u01E2AB } \u01E3ab test utf-12.5 {Tcl_UtfToLower Georgian (new in Unicode 11)} { string tolower \u10D0\u1C90 } \u10D0\u10D0 test utf-12.6 {Tcl_UtfToUpper low/high surrogate)} { string tolower \udc24\ud824 } \udc24\ud824 test utf-13.1 {Tcl_UtfToTitle} { string totitle {} } {} test utf-13.2 {Tcl_UtfToTitle} { string totitle abc } Abc test utf-13.3 {Tcl_UtfToTitle} { string totitle \u00E3ab } \u00C3ab test utf-13.4 {Tcl_UtfToTitle} { string totitle \u01F3ab } \u01F2ab test utf-13.5 {Tcl_UtfToTitle Georgian (new in Unicode 11)} { string totitle \u10D0\u1C90 } \u10D0\u1C90 test utf-13.6 {Tcl_UtfToTitle Georgian (new in Unicode 11)} { string totitle \u1C90\u10D0 } \u1C90\u10D0 test utf-13.7 {Tcl_UtfToTitle low/high surrogate)} { string totitle \udc24\ud824 } \udc24\ud824 test utf-14.1 {Tcl_UtfNcasecmp} { string compare -nocase a b } -1 |
︙ | ︙ |