Tcl Source Code

Check-in [1997a15ffd]
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Slightly better unmatched-surrogates handling. Unmatched High surrogates will still be silently removed, but Unmatched Low surrogates will pass through as-is now. Inspired by Kevin Kenny's remarks. Thanks!
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | tip-389
Files: files | file ages | folders
SHA3-256: 1997a15ffd0a6ac9d5957927baeafa725d497d0298a00dcea22ffa3590a49b63
User & Date: jan.nijtmans 2018-04-17 21:49:00
Context
2018-04-20
10:16
TIP #389 implementation. check-in: e109760b1c user: jan.nijtmans tags: core-8-branch
2018-04-17
21:49
Slightly better unmatched-surrogates handling. Unmatched High surrogates will still be silently remo... Closed-Leaf check-in: 1997a15ffd user: jan.nijtmans tags: tip-389
2018-04-11
16:17
merge 8.7 check-in: fdf9328219 user: dgp tags: tip-389
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to generic/tclUtf.c.

144
145
146
147
148
149
150



151
152
153


154
155
156


157
158
159
160
161
162
163
164
165
166
	    return 2;
	}
	if (ch <= 0xFFFF) {
#if TCL_UTF_MAX == 4
	    if ((ch & 0xF800) == 0xD800) {
		if (ch & 0x0400) {
		    /* Low surrogate */



		    buf[3] = (char) ((ch | 0x80) & 0xBF);
		    buf[2] |= (char) (((ch >> 6) | 0x80) & 0x8F);
		    return 4;


		} else {
		    /* High surrogate */
		    ch += 0x40;


		    buf[2] = (char) (((ch << 4) | 0x80) & 0xB0);
		    buf[1] = (char) (((ch >> 2) | 0x80) & 0xBF);
		    buf[0] = (char) (((ch >> 8) | 0xF0) & 0xF7);
		    return 0;
		}
	    }
#endif
	    goto three;
	}







>
>
>
|
|
|
>
>



>
>
|
|
|







144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
	    return 2;
	}
	if (ch <= 0xFFFF) {
#if TCL_UTF_MAX == 4
	    if ((ch & 0xF800) == 0xD800) {
		if (ch & 0x0400) {
		    /* Low surrogate */
		    if (((buf[0] & 0xF8) == 0xF0) && ((buf[1] & 0xC0) == 0x80)
			    && ((buf[2] & 0xCF) == 0)) {
			/* Previous Tcl_UniChar was a High surrogate, so combine */
			buf[3] = (char) ((ch & 0x3F) | 0x80);
			buf[2] |= (char) (((ch >> 6) & 0x0F) | 0x80);
			return 4;
		    }
		    /* Previous Tcl_UniChar was not a High surrogate, so just output */
		} else {
		    /* High surrogate */
		    ch += 0x40;
		    /* Fill buffer with specific 3-byte (invalid) byte combination,
		       so following Low surrogate can recognize it and combine */
		    buf[2] = (char) ((ch << 4) & 0x30);
		    buf[1] = (char) (((ch >> 2) & 0x3F) | 0x80);
		    buf[0] = (char) (((ch >> 8) & 0x07) | 0xF0);
		    return 0;
		}
	    }
#endif
	    goto three;
	}