Tcl Source Code

Check-in [48c676b649]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Merge 8.7
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 48c676b6493310b4d73459b6f92b79b3cd3a7c9717abeaa94dafa64715020602
User & Date: jan.nijtmans 2019-03-18 22:34:14.192
Context
2019-03-20
22:51
Merge 8.7 check-in: bb9b52ab82 user: jan.nijtmans tags: trunk
2019-03-18
22:45
Merge trunk check-in: 8fe25e01d3 user: jan.nijtmans tags: novem
22:34
Merge 8.7 check-in: 48c676b649 user: jan.nijtmans tags: trunk
22:32
Comment Comment Tcl_UniCharToUtf() better, what happens handling surrogates. Add type cast in tclUtf... check-in: b02df08680 user: jan.nijtmans tags: core-8-branch
22:17
Merge 8.7 check-in: b9ad5fe740 user: jan.nijtmans tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to doc/Utf.3.
126
127
128
129
130
131
132
133
134
135

136
137
138
139
140
141
142
.PP
\fBTCL_UTF_MAX\fR is the maximum number of bytes that it takes to
represent one Unicode character in the UTF-8 representation.
.PP
\fBTcl_UniCharToUtf\fR stores the character \fIch\fR as a UTF-8 string
in starting at \fIbuf\fR.  The return value is the number of bytes stored
in \fIbuf\fR. If ch is a high surrogate (range U+D800 - U+DBFF), then
the return value will be 0 and nothing will be stored. If you still
want to produce UTF-8 output for it (even though knowing it's an illegal
code-point on its own), just call \fBTcl_UniCharToUtf\fR again using ch = -1.

.PP
\fBTcl_UtfToUniChar\fR reads one UTF-8 character starting at \fIsrc\fR
and stores it as a Tcl_UniChar in \fI*chPtr\fR.  The return value is the
number of bytes read from \fIsrc\fR.  The caller must ensure that the
source buffer is long enough such that this routine does not run off the
end and dereference non-existent or random memory; if the source buffer
is known to be null-terminated, this will not happen.  If the input is







|
|
|
>







126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
.PP
\fBTCL_UTF_MAX\fR is the maximum number of bytes that it takes to
represent one Unicode character in the UTF-8 representation.
.PP
\fBTcl_UniCharToUtf\fR stores the character \fIch\fR as a UTF-8 string
in starting at \fIbuf\fR.  The return value is the number of bytes stored
in \fIbuf\fR. If ch is a high surrogate (range U+D800 - U+DBFF), then
the return value will be 1 and a single byte in the range 0xF0 - 0xF4
will be stored. If you still want to produce UTF-8 output for it (even
though knowing it's an illegal code-point on its own), just call
\fBTcl_UniCharToUtf\fR again specifying ch = -1.
.PP
\fBTcl_UtfToUniChar\fR reads one UTF-8 character starting at \fIsrc\fR
and stores it as a Tcl_UniChar in \fI*chPtr\fR.  The return value is the
number of bytes read from \fIsrc\fR.  The caller must ensure that the
source buffer is long enough such that this routine does not run off the
end and dereference non-existent or random memory; if the source buffer
is known to be null-terminated, this will not happen.  If the input is
Changes to generic/tclScan.c.
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
				 * required. */
{
    int gotXpg, gotSequential, value, i, flags;
    char *end;
    Tcl_UniChar ch = 0;
    int objIndex, xpgSize, nspace = numVars;
    int *nassign = TclStackAlloc(interp, nspace * sizeof(int));
    char buf[TCL_UTF_MAX+1] = "";
    Tcl_Obj *errorMsg;		/* Place to build an error messages. Note that
				 * these are messy operations because we do
				 * not want to use the formatting engine;
				 * we're inside there! */

    /*
     * Initialize an array that records the number of times a variable is







|







257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
				 * required. */
{
    int gotXpg, gotSequential, value, i, flags;
    char *end;
    Tcl_UniChar ch = 0;
    int objIndex, xpgSize, nspace = numVars;
    int *nassign = TclStackAlloc(interp, nspace * sizeof(int));
    char buf[TCL_UTF_MAX + 1] = "";
    Tcl_Obj *errorMsg;		/* Place to build an error messages. Note that
				 * these are messy operations because we do
				 * not want to use the formatting engine;
				 * we're inside there! */

    /*
     * Initialize an array that records the number of times a variable is
Changes to generic/tclUtf.c.
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
		/* produce high surrogate, advance source pointer */
		*chPtr = 0xD800 + high;
		return 1;
	    }
#else
	    *chPtr = (((byte & 0x07) << 18) | ((src[1] & 0x3F) << 12)
		    | ((src[2] & 0x3F) << 6) | (src[3] & 0x3F));
	    if ((*chPtr - 0x10000) <= 0xFFFFF) {
		return 4;
	    }
#endif
	}

	/*
	 * A four-byte-character lead-byte not followed by two trail-bytes







|







443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
		/* produce high surrogate, advance source pointer */
		*chPtr = 0xD800 + high;
		return 1;
	    }
#else
	    *chPtr = (((byte & 0x07) << 18) | ((src[1] & 0x3F) << 12)
		    | ((src[2] & 0x3F) << 6) | (src[3] & 0x3F));
	    if ((unsigned)(*chPtr - 0x10000) <= 0xFFFFF) {
		return 4;
	    }
#endif
	}

	/*
	 * A four-byte-character lead-byte not followed by two trail-bytes