Tcl Source Code

Check-in [e766d23655]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Merge 8.7
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: e766d236553aee341a4a276859ff2c2bad1afa1b2ac4ff44c86c5f733eafe770
User & Date: jan.nijtmans 2019-03-02 16:53:42.467
Context
2019-03-05
18:23
merge 8.7 (TIP#527, New measurement facilities in TCL: New command timerate, performance test suite) check-in: e41cbd042a user: sebres tags: trunk
2019-03-02
16:53
Merge 8.7 check-in: e766d23655 user: jan.nijtmans tags: trunk
16:52
Minor optimization in UTF-8 handling, and add some comments describing how Tcl_UniCharToUtf() handle... check-in: 6e3632ede5 user: jan.nijtmans tags: core-8-branch
16:09
Merge 8.7 check-in: ff562e2ab8 user: jan.nijtmans tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to generic/tclScan.c.
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
	case 'c':
	    /*
	     * Scan a single Unicode character.
	     */

	    offset = TclUtfToUniChar(string, &sch);
	    i = (int)sch;
#if TCL_UTF_MAX == 4
	    if (((sch & 0xFC00) == 0xD800) && (offset < 3)) {
		offset += TclUtfToUniChar(string+offset, &sch);
		i = (((i<<10) & 0x0FFC00) + 0x10000) + (sch & 0x3FF);
	    }
#endif
	    string += offset;
	    if (!(flags & SCAN_SUPPRESS)) {
		objPtr = Tcl_NewWideIntObj(i);







|
|







877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
	case 'c':
	    /*
	     * Scan a single Unicode character.
	     */

	    offset = TclUtfToUniChar(string, &sch);
	    i = (int)sch;
#if TCL_UTF_MAX <= 4
	    if ((sch >= 0xD800) && (offset < 3)) {
		offset += TclUtfToUniChar(string+offset, &sch);
		i = (((i<<10) & 0x0FFC00) + 0x10000) + (sch & 0x3FF);
	    }
#endif
	    string += offset;
	    if (!(flags & SCAN_SUPPRESS)) {
		objPtr = Tcl_NewWideIntObj(i);
Changes to generic/tclUtf.c.
108
109
110
111
112
113
114













115
116
117
118
119
120
121
 *---------------------------------------------------------------------------
 *
 * Tcl_UniCharToUtf --
 *
 *	Store the given Tcl_UniChar as a sequence of UTF-8 bytes in the
 *	provided buffer. Equivalent to Plan 9 runetochar().
 *













 * Results:
 *	The return values is the number of bytes in the buffer that were
 *	consumed.
 *
 * Side effects:
 *	None.
 *







>
>
>
>
>
>
>
>
>
>
>
>
>







108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
 *---------------------------------------------------------------------------
 *
 * Tcl_UniCharToUtf --
 *
 *	Store the given Tcl_UniChar as a sequence of UTF-8 bytes in the
 *	provided buffer. Equivalent to Plan 9 runetochar().
 *
 *	Special handling of Surrogate pairs is handled as follows:
 *	When this function is called for ch being a high surrogate,
 *	the first byte of the 4-byte UTF-8 sequence is produced and
 *	the function returns 1. Calling the function again with a
 *	low surrogate, the remaining 3 bytes of the 4-byte UTF-8
 *	sequence is produced, and the function returns 3. The buffer
 *	is used to remember the high surrogate between the two calls.
 *
 *	If no low surrogate follows the high surrogate (which is actually
 *	illegal), this can be handled reasonably by calling Tcl_UniCharToUtf
 *	again with ch = -1. This will produce a 3-byte UTF-8 sequence
 *	representing the high surrogate.
 *
 * Results:
 *	The return values is the number of bytes in the buffer that were
 *	consumed.
 *
 * Side effects:
 *	None.
 *
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
 *	The caller must ensure that the source buffer is long enough that this
 *	routine does not run off the end and dereference non-existent memory
 *	looking for trail bytes. If the source buffer is known to be '\0'
 *	terminated, this cannot happen. Otherwise, the caller should call
 *	Tcl_UtfCharComplete() before calling this routine to ensure that
 *	enough bytes remain in the string.
 *
 *	If TCL_UTF_MAX == 4, special handling of Surrogate pairs is done:
 *	For any UTF-8 string containing a character outside of the BMP, the
 *	first call to this function will fill *chPtr with the high surrogate
 *	and generate a return value of 0. Calling Tcl_UtfToUniChar again
 *	will produce the low surrogate and a return value of 4. Because *chPtr
 *	is used to remember whether the high surrogate is already produced, it
 *	is recommended to initialize the variable it points to as 0 before
 *	the first call to Tcl_UtfToUniChar is done.
 *
 * Results:
 *	*chPtr is filled with the Tcl_UniChar, and the return value is the
 *	number of bytes from the UTF-8 string that were consumed.







|


|
|







279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
 *	The caller must ensure that the source buffer is long enough that this
 *	routine does not run off the end and dereference non-existent memory
 *	looking for trail bytes. If the source buffer is known to be '\0'
 *	terminated, this cannot happen. Otherwise, the caller should call
 *	Tcl_UtfCharComplete() before calling this routine to ensure that
 *	enough bytes remain in the string.
 *
 *	Special handling of Surrogate pairs is handled as follows:
 *	For any UTF-8 string containing a character outside of the BMP, the
 *	first call to this function will fill *chPtr with the high surrogate
 *	and generate a return value of 1. Calling Tcl_UtfToUniChar again
 *	will produce the low surrogate and a return value of 3. Because *chPtr
 *	is used to remember whether the high surrogate is already produced, it
 *	is recommended to initialize the variable it points to as 0 before
 *	the first call to Tcl_UtfToUniChar is done.
 *
 * Results:
 *	*chPtr is filled with the Tcl_UniChar, and the return value is the
 *	number of bytes from the UTF-8 string that were consumed.