Tcl Source Code

Changes On Branch encodings-with-flags
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Changes In Branch encodings-with-flags Excluding Merge-Ins

This is equivalent to a diff from 59b0904ac7 to b3471f0625

2022-03-17
09:51
TIP #601: Make "encoding convertto/convertfrom" throw exceptions check-in: fb919be8a3 user: jan.nijtmans tags: core-8-branch
2022-03-14
18:06
TIP605 encoding failindex: prepare TCL 9.0 main branch with branch "encodings-with-flags" merged. check-in: c8fc04b93b user: oehhar tags: tip607-encoding-failindex
15:46
TIP607 encoding failindex: start implementation check-in: a611c76903 user: oehhar tags: tip601-encoding-failindex, tip607-encoding-failindex
2022-03-05
21:56
-nothrow -> -nocomplain Closed-Leaf check-in: b3471f0625 user: jan.nijtmans tags: encodings-with-flags
10:32
Add "const" to Tcl_SetNotifier() argument. Should have been part of TIP #27, looooooong ago. This si... check-in: a993dee980 user: jan.nijtmans tags: core-8-branch
2022-03-04
16:21
Merge 8.7 check-in: 791cc676b8 user: jan.nijtmans tags: encodings-with-flags
16:19
Merge 8.7 check-in: 1e892e4df9 user: jan.nijtmans tags: trunk, main
16:18
Merge 8.6 check-in: 59b0904ac7 user: jan.nijtmans tags: core-8-branch
16:15
Fix [8d8bb39962]: incorrect notifier initialization in tclXtNotify.c check-in: 362f6d184c user: jan.nijtmans tags: core-8-6-branch
2022-02-24
22:20
3 more files with TCL_UTF_MAX checks check-in: 70eb0efb69 user: jan.nijtmans tags: core-8-branch

Changes to doc/Encoding.3.

21
22
23
24
25
26
27



28
29
30



31
32
33
34
35
36
37
.sp
int
\fBTcl_GetEncodingFromObj\fR(\fIinterp, objPtr, encodingPtr\fR)
.sp
char *
\fBTcl_ExternalToUtfDString\fR(\fIencoding, src, srcLen, dstPtr\fR)
.sp



char *
\fBTcl_UtfToExternalDString\fR(\fIencoding, src, srcLen, dstPtr\fR)
.sp



int
\fBTcl_ExternalToUtf\fR(\fIinterp, encoding, src, srcLen, flags, statePtr,
                  dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr\fR)
.sp
int
\fBTcl_UtfToExternal\fR(\fIinterp, encoding, src, srcLen, flags, statePtr,
                  dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr\fR)







>
>
>



>
>
>







21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
.sp
int
\fBTcl_GetEncodingFromObj\fR(\fIinterp, objPtr, encodingPtr\fR)
.sp
char *
\fBTcl_ExternalToUtfDString\fR(\fIencoding, src, srcLen, dstPtr\fR)
.sp
size_t
\fBTcl_ExternalToUtfDStringEx\fR(\fIencoding, src, srcLen, flags, dstPtr\fR)
.sp
char *
\fBTcl_UtfToExternalDString\fR(\fIencoding, src, srcLen, dstPtr\fR)
.sp
size_t
\fBTcl_UtfToExternalDStringEx\fR(\fIencoding, src, srcLen, flags, dstPtr\fR)
.sp
int
\fBTcl_ExternalToUtf\fR(\fIinterp, encoding, src, srcLen, flags, statePtr,
                  dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr\fR)
.sp
int
\fBTcl_UtfToExternal\fR(\fIinterp, encoding, src, srcLen, flags, statePtr,
                  dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr\fR)
104
105
106
107
108
109
110
111



112
113
114
115
116
117
118
converted. \fBTCL_ENCODING_END\fR signifies that the source buffer is the last
block in a (potentially multi-block) input stream, telling the conversion
routine to perform any finalization that needs to occur after the last
byte is converted and then to reset to an initial state.
\fBTCL_ENCODING_STOPONERROR\fR signifies that the conversion routine should
return immediately upon reading a source character that does not exist in
the target encoding; otherwise a default fallback character will
automatically be substituted.



.AP Tcl_EncodingState *statePtr in/out
Used when converting a (generally long or indefinite length) byte stream
in a piece-by-piece fashion.  The conversion routine stores its current
state in \fI*statePtr\fR after \fIsrc\fR (the buffer containing the
current piece) has been converted; that state information must be passed
back when converting the next piece of the stream so the conversion
routine knows what state it was in when it left off at the end of the







|
>
>
>







110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
converted. \fBTCL_ENCODING_END\fR signifies that the source buffer is the last
block in a (potentially multi-block) input stream, telling the conversion
routine to perform any finalization that needs to occur after the last
byte is converted and then to reset to an initial state.
\fBTCL_ENCODING_STOPONERROR\fR signifies that the conversion routine should
return immediately upon reading a source character that does not exist in
the target encoding; otherwise a default fallback character will
automatically be substituted. The flag \fBTCL_ENCODING_NOCOMPLAIN\fR has
no effect, it is reserved for Tcl 9.0. The flag \fBTCL_ENCODING_MODIFIED\fR makes
\fBTcl_UtfToExternalDStringEx\fR and \fBTcl_UtfToExternal\fR produce the
byte sequence \exC0\ex80 in stead of \ex00, for the utf-8/cesu-8 encoders.
.AP Tcl_EncodingState *statePtr in/out
Used when converting a (generally long or indefinite length) byte stream
in a piece-by-piece fashion.  The conversion routine stores its current
state in \fI*statePtr\fR after \fIsrc\fR (the buffer containing the
current piece) has been converted; that state information must be passed
back when converting the next piece of the stream so the conversion
routine knows what state it was in when it left off at the end of the
203
204
205
206
207
208
209





210
211
212
213
214
215
216
\fBTcl_ExternalToUtfDString\fR converts a source buffer \fIsrc\fR from the
specified \fIencoding\fR into UTF-8.  The converted bytes are stored in
\fIdstPtr\fR, which is then null-terminated.  The caller should eventually
call \fBTcl_DStringFree\fR to free any information stored in \fIdstPtr\fR.
When converting, if any of the characters in the source buffer cannot be
represented in the target encoding, a default fallback character will be
used.  The return value is a pointer to the value stored in the DString.





.PP
\fBTcl_ExternalToUtf\fR converts a source buffer \fIsrc\fR from the specified
\fIencoding\fR into UTF-8.  Up to \fIsrcLen\fR bytes are converted from the
source buffer and up to \fIdstLen\fR converted bytes are stored in \fIdst\fR.
In all cases, \fI*srcReadPtr\fR is filled with the number of bytes that were
successfully converted from \fIsrc\fR and \fI*dstWrotePtr\fR is filled with
the corresponding number of bytes that were stored in \fIdst\fR.  The return







>
>
>
>
>







212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
\fBTcl_ExternalToUtfDString\fR converts a source buffer \fIsrc\fR from the
specified \fIencoding\fR into UTF-8.  The converted bytes are stored in
\fIdstPtr\fR, which is then null-terminated.  The caller should eventually
call \fBTcl_DStringFree\fR to free any information stored in \fIdstPtr\fR.
When converting, if any of the characters in the source buffer cannot be
represented in the target encoding, a default fallback character will be
used.  The return value is a pointer to the value stored in the DString.
.PP
\fBTcl_ExternalToUtfDStringEx\fR is the same as \fBTcl_ExternalToUtfDString\fR,
but it has an additional flags parameter.  The return value is the index of
the first byte in the input string causing a conversion error.
Or (size_t)-1 if all is OK.
.PP
\fBTcl_ExternalToUtf\fR converts a source buffer \fIsrc\fR from the specified
\fIencoding\fR into UTF-8.  Up to \fIsrcLen\fR bytes are converted from the
source buffer and up to \fIdstLen\fR converted bytes are stored in \fIdst\fR.
In all cases, \fI*srcReadPtr\fR is filled with the number of bytes that were
successfully converted from \fIsrc\fR and \fI*dstWrotePtr\fR is filled with
the corresponding number of bytes that were stored in \fIdst\fR.  The return
241
242
243
244
245
246
247





248
249
250
251
252
253
254
into the specified \fIencoding\fR.  The converted bytes are stored in
\fIdstPtr\fR, which is then terminated with the appropriate encoding-specific
null.  The caller should eventually call \fBTcl_DStringFree\fR to free any
information stored in \fIdstPtr\fR.  When converting, if any of the
characters in the source buffer cannot be represented in the target
encoding, a default fallback character will be used.  The return value is
a pointer to the value stored in the DString.





.PP
\fBTcl_UtfToExternal\fR converts a source buffer \fIsrc\fR from UTF-8 into
the specified \fIencoding\fR.  Up to \fIsrcLen\fR bytes are converted from
the source buffer and up to \fIdstLen\fR converted bytes are stored in
\fIdst\fR.  In all cases, \fI*srcReadPtr\fR is filled with the number of
bytes that were successfully converted from \fIsrc\fR and \fI*dstWrotePtr\fR
is filled with the corresponding number of bytes that were stored in







>
>
>
>
>







255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
into the specified \fIencoding\fR.  The converted bytes are stored in
\fIdstPtr\fR, which is then terminated with the appropriate encoding-specific
null.  The caller should eventually call \fBTcl_DStringFree\fR to free any
information stored in \fIdstPtr\fR.  When converting, if any of the
characters in the source buffer cannot be represented in the target
encoding, a default fallback character will be used.  The return value is
a pointer to the value stored in the DString.
.PP
\fBTcl_UtfToExternalDStringEx\fR is the same as \fBTcl_UtfToExternalDString\fR,
but it has an additional flags parameter.  The return value is the index of
the first byte of an utf-8 byte-sequence in the input string causing a
conversion error. Or (size_t)-1 if all is OK.
.PP
\fBTcl_UtfToExternal\fR converts a source buffer \fIsrc\fR from UTF-8 into
the specified \fIencoding\fR.  Up to \fIsrcLen\fR bytes are converted from
the source buffer and up to \fIdstLen\fR converted bytes are stored in
\fIdst\fR.  In all cases, \fI*srcReadPtr\fR is filled with the number of
bytes that were successfully converted from \fIsrc\fR and \fI*dstWrotePtr\fR
is filled with the corresponding number of bytes that were stored in

Changes to generic/tcl.decls.

2432
2433
2434
2435
2436
2437
2438








2439
2440
2441
2442
2443
2444
2445
}
declare 656 {
    const char *Tcl_UtfPrev(const char *src, const char *start)
}
declare 657 {
    int Tcl_UniCharIsUnicode(int ch)
}









# TIP #511
declare 660 {
    int Tcl_AsyncMarkFromSignal(Tcl_AsyncHandler async, int sigNumber)
}

# TIP #617







>
>
>
>
>
>
>
>







2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
}
declare 656 {
    const char *Tcl_UtfPrev(const char *src, const char *start)
}
declare 657 {
    int Tcl_UniCharIsUnicode(int ch)
}
declare 658 {
    size_t Tcl_ExternalToUtfDStringEx(Tcl_Encoding encoding,
	    const char *src, int srcLen, int flags, Tcl_DString *dsPtr)
}
declare 659 {
    size_t Tcl_UtfToExternalDStringEx(Tcl_Encoding encoding,
	    const char *src, int srcLen, int flags, Tcl_DString *dsPtr)
}

# TIP #511
declare 660 {
    int Tcl_AsyncMarkFromSignal(Tcl_AsyncHandler async, int sigNumber)
}

# TIP #617

Changes to generic/tcl.h.

2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079












2080
2081
2082
2083
2084
2085
2086


2087
2088
2089
2090
2091
2092
2093
 *				reset to an initial state. If the source
 *				buffer contains the entire input stream to be
 *				converted, this flag should be set.
 * TCL_ENCODING_STOPONERROR -	If set, the converter returns immediately upon
 *				encountering an invalid byte sequence or a
 *				source character that has no mapping in the
 *				target encoding. If clear, the converter
 *				substitues the problematic character(s) with
 *				one or more "close" characters in the
 *				destination buffer and then continues to
 *				convert the source.
 * TCL_ENCODING_NO_TERMINATE - 	If set, Tcl_ExternalToUtf does not append a
 *				terminating NUL byte.  Since it does not need
 *				an extra byte for a terminating NUL, it fills
 *				all dstLen bytes with encoded UTF-8 content if
 *				needed.  If clear, a byte is reserved in the
 *				dst space for NUL termination, and a
 *				terminating NUL is appended.
 * TCL_ENCODING_CHAR_LIMIT -	If set and dstCharsPtr is not NULL, then
 *				Tcl_ExternalToUtf takes the initial value of
 *				*dstCharsPtr as a limit of the maximum number
 *				of chars to produce in the encoded UTF-8
 *				content.  Otherwise, the number of chars
 *				produced is controlled only by other limiting
 *				factors.












 */

#define TCL_ENCODING_START		0x01
#define TCL_ENCODING_END		0x02
#define TCL_ENCODING_STOPONERROR	0x04
#define TCL_ENCODING_NO_TERMINATE	0x08
#define TCL_ENCODING_CHAR_LIMIT		0x10



/*
 * The following definitions are the error codes returned by the conversion
 * routines:
 *
 * TCL_OK -			All characters were converted.
 * TCL_CONVERT_NOSPACE -	The output buffer would not have been large







|


|














>
>
>
>
>
>
>
>
>
>
>
>







>
>







2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
 *				reset to an initial state. If the source
 *				buffer contains the entire input stream to be
 *				converted, this flag should be set.
 * TCL_ENCODING_STOPONERROR -	If set, the converter returns immediately upon
 *				encountering an invalid byte sequence or a
 *				source character that has no mapping in the
 *				target encoding. If clear, the converter
 *				substitutes the problematic character(s) with
 *				one or more "close" characters in the
 *				destination buffer and then continues to
 *				convert the source. Only for Tcl 8.x.
 * TCL_ENCODING_NO_TERMINATE - 	If set, Tcl_ExternalToUtf does not append a
 *				terminating NUL byte.  Since it does not need
 *				an extra byte for a terminating NUL, it fills
 *				all dstLen bytes with encoded UTF-8 content if
 *				needed.  If clear, a byte is reserved in the
 *				dst space for NUL termination, and a
 *				terminating NUL is appended.
 * TCL_ENCODING_CHAR_LIMIT -	If set and dstCharsPtr is not NULL, then
 *				Tcl_ExternalToUtf takes the initial value of
 *				*dstCharsPtr as a limit of the maximum number
 *				of chars to produce in the encoded UTF-8
 *				content.  Otherwise, the number of chars
 *				produced is controlled only by other limiting
 *				factors.
 * TCL_ENCODING_MODIFIED -	Convert NULL bytes to \xC0\x80 in stead of
 *				0x00. Only valid for "utf-8" and "cesu-8".
 *				This flag is implicit for external -> internal conversions,
 *				optional for internal -> external conversions.
 * TCL_ENCODING_NOCOMPLAIN -	If set, the converter
 *				substitutes the problematic character(s) with
 *				one or more "close" characters in the
 *				destination buffer and then continues to
 *				convert the source. If clear, the converter returns
 *				immediately upon encountering an invalid byte sequence
 *				or a source character that has no mapping in the
 *				target encoding. Only for Tcl 9.x.
 */

#define TCL_ENCODING_START		0x01
#define TCL_ENCODING_END		0x02
#define TCL_ENCODING_STOPONERROR	0x04
#define TCL_ENCODING_NO_TERMINATE	0x08
#define TCL_ENCODING_CHAR_LIMIT		0x10
#define TCL_ENCODING_MODIFIED		0x20
#define TCL_ENCODING_NOCOMPLAIN		0x40

/*
 * The following definitions are the error codes returned by the conversion
 * routines:
 *
 * TCL_OK -			All characters were converted.
 * TCL_CONVERT_NOSPACE -	The output buffer would not have been large

Changes to generic/tclCmdAH.c.

509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
 */

Tcl_Command
TclInitEncodingCmd(
    Tcl_Interp* interp)		/* Tcl interpreter */
{
    static const EnsembleImplMap encodingImplMap[] = {
	{"convertfrom", EncodingConvertfromObjCmd, TclCompileBasic1Or2ArgCmd, NULL, NULL, 0},
	{"convertto",   EncodingConverttoObjCmd,   TclCompileBasic1Or2ArgCmd, NULL, NULL, 0},
	{"dirs",        EncodingDirsObjCmd,        TclCompileBasic0Or1ArgCmd, NULL, NULL, 1},
	{"names",       EncodingNamesObjCmd,       TclCompileBasic0ArgCmd,    NULL, NULL, 0},
	{"system",      EncodingSystemObjCmd,      TclCompileBasic0Or1ArgCmd, NULL, NULL, 1},
	{NULL,          NULL,                      NULL,                      NULL, NULL, 0}
    };

    return TclMakeEnsemble(interp, "encoding", encodingImplMap);







|
|







509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
 */

Tcl_Command
TclInitEncodingCmd(
    Tcl_Interp* interp)		/* Tcl interpreter */
{
    static const EnsembleImplMap encodingImplMap[] = {
	{"convertfrom", EncodingConvertfromObjCmd, TclCompileBasic1To3ArgCmd, NULL, NULL, 0},
	{"convertto",   EncodingConverttoObjCmd,   TclCompileBasic1To3ArgCmd, NULL, NULL, 0},
	{"dirs",        EncodingDirsObjCmd,        TclCompileBasic0Or1ArgCmd, NULL, NULL, 1},
	{"names",       EncodingNamesObjCmd,       TclCompileBasic0ArgCmd,    NULL, NULL, 0},
	{"system",      EncodingSystemObjCmd,      TclCompileBasic0Or1ArgCmd, NULL, NULL, 1},
	{NULL,          NULL,                      NULL,                      NULL, NULL, 0}
    };

    return TclMakeEnsemble(interp, "encoding", encodingImplMap);
546
547
548
549
550
551
552






553
554
555
556






557
558
559
560




561




562

563
564
565
566

567
568
569


570






571











572
573
574
575
576
577
578
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_Obj *data;		/* Byte array to convert */
    Tcl_DString ds;		/* Buffer to hold the string */
    Tcl_Encoding encoding;	/* Encoding to use */
    int length;			/* Length of the byte array being converted */
    const char *bytesPtr;	/* Pointer to the first byte of the array */







    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];






    } else if (objc == 3) {
	if (Tcl_GetEncodingFromObj(interp, objv[1], &encoding) != TCL_OK) {
	    return TCL_ERROR;
	}




	data = objv[2];




    } else {

	Tcl_WrongNumArgs(interp, 1, objv, "?encoding? data");
	return TCL_ERROR;
    }


    /*
     * Convert the string into a byte array in 'ds'
     */


    bytesPtr = (char *) Tcl_GetByteArrayFromObj(data, &length);






    Tcl_ExternalToUtfDString(encoding, bytesPtr, length, &ds);












    /*
     * Note that we cannot use Tcl_DStringResult here because it will
     * truncate the string at the first null byte.
     */

    Tcl_SetObjResult(interp, TclDStringToObj(&ds));







>
>
>
>
>
>




>
>
>
>
>
>
|
|
|
|
>
>
>
>
|
>
>
>
>

>
|



>



>
>
|
>
>
>
>
>
>
|
>
>
>
>
>
>
>
>
>
>
>







546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_Obj *data;		/* Byte array to convert */
    Tcl_DString ds;		/* Buffer to hold the string */
    Tcl_Encoding encoding;	/* Encoding to use */
    int length;			/* Length of the byte array being converted */
    const char *bytesPtr;	/* Pointer to the first byte of the array */
#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
    int flags = TCL_ENCODING_STOPONERROR;
#else
    int flags = TCL_ENCODING_NOCOMPLAIN;
#endif
    size_t result;

    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];
    } else if ((unsigned)(objc - 2) < 3) {
	data = objv[objc - 1];
	bytesPtr = Tcl_GetString(objv[1]);
	if (bytesPtr[0] == '-' && bytesPtr[1] == 'n'
		&& !strncmp(bytesPtr, "-nocomplain", strlen(bytesPtr))) {
	    flags = TCL_ENCODING_NOCOMPLAIN;
	} else if (objc < 4) {
	    if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
		return TCL_ERROR;
	    }
	    goto encConvFromOK;
	} else {
	    goto encConvFromError;
	}
	if (objc < 4) {
	    encoding = Tcl_GetEncoding(interp, NULL);
	} else if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
	    return TCL_ERROR;
	}
    } else {
    encConvFromError:
	Tcl_WrongNumArgs(interp, 1, objv, "?-nocomplain? ?encoding? data");
	return TCL_ERROR;
    }

encConvFromOK:
    /*
     * Convert the string into a byte array in 'ds'
     */
#if !defined(TCL_NO_DEPRECATED) && (TCL_MAJOR_VERSION < 9)
    if (!(flags & TCL_ENCODING_STOPONERROR)) {
	bytesPtr = (char *) Tcl_GetByteArrayFromObj(data, &length);
    } else
#endif
    bytesPtr = (char *) TclGetBytesFromObj(interp, data, &length);
    if (bytesPtr == NULL) {
	return TCL_ERROR;
    }
    result = Tcl_ExternalToUtfDStringEx(encoding, bytesPtr, length,
	    flags, &ds);
    if ((flags & TCL_ENCODING_STOPONERROR) && (result != (size_t)-1)) {
	char buf[TCL_INTEGER_SPACE];
	sprintf(buf, "%" TCL_Z_MODIFIER "u", result);
	Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected byte sequence starting at index %"
		TCL_Z_MODIFIER "u: '\\x%X'", result, UCHAR(bytesPtr[result])));
	Tcl_SetErrorCode(interp, "TCL", "ENCODING", "ILLEGALSEQUENCE",
		buf, NULL);
	Tcl_DStringFree(&ds);
	return TCL_ERROR;
    }

    /*
     * Note that we cannot use Tcl_DStringResult here because it will
     * truncate the string at the first null byte.
     */

    Tcl_SetObjResult(interp, TclDStringToObj(&ds));
608
609
610
611
612
613
614






615
616
617
618






619
620
621
622




623




624

625
626
627
628

629
630
631
632
633
634














635
636
637
638
639
640
641
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_Obj *data;		/* String to convert */
    Tcl_DString ds;		/* Buffer to hold the byte array */
    Tcl_Encoding encoding;	/* Encoding to use */
    int length;			/* Length of the string being converted */
    const char *stringPtr;	/* Pointer to the first byte of the string */







    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];






    } else if (objc == 3) {
	if (Tcl_GetEncodingFromObj(interp, objv[1], &encoding) != TCL_OK) {
	    return TCL_ERROR;
	}




	data = objv[2];




    } else {

	Tcl_WrongNumArgs(interp, 1, objv, "?encoding? data");
	return TCL_ERROR;
    }


    /*
     * Convert the string to a byte array in 'ds'
     */

    stringPtr = TclGetStringFromObj(data, &length);
    Tcl_UtfToExternalDString(encoding, stringPtr, length, &ds);














    Tcl_SetObjResult(interp,
		     Tcl_NewByteArrayObj((unsigned char*) Tcl_DStringValue(&ds),
					 Tcl_DStringLength(&ds)));
    Tcl_DStringFree(&ds);

    /*
     * We're done with the encoding







>
>
>
>
>
>




>
>
>
>
>
>
|
|
|
|
>
>
>
>
|
>
>
>
>

>
|



>





|
>
>
>
>
>
>
>
>
>
>
>
>
>
>







649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_Obj *data;		/* String to convert */
    Tcl_DString ds;		/* Buffer to hold the byte array */
    Tcl_Encoding encoding;	/* Encoding to use */
    int length;			/* Length of the string being converted */
    const char *stringPtr;	/* Pointer to the first byte of the string */
    size_t result;
#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
    int flags = TCL_ENCODING_STOPONERROR;
#else
    int flags = TCL_ENCODING_NOCOMPLAIN;
#endif

    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];
    } else if ((unsigned)(objc - 2) < 3) {
	data = objv[objc - 1];
	stringPtr = Tcl_GetString(objv[1]);
	if (stringPtr[0] == '-' && stringPtr[1] == 'n'
		&& !strncmp(stringPtr, "-nocomplain", strlen(stringPtr))) {
	    flags = TCL_ENCODING_NOCOMPLAIN;
	} else if (objc < 4) {
	    if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
		return TCL_ERROR;
	    }
	    goto encConvToOK;
	} else {
	    goto encConvToError;
	}
	if (objc < 4) {
	    encoding = Tcl_GetEncoding(interp, NULL);
	} else if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
	    return TCL_ERROR;
	}
    } else {
    encConvToError:
	Tcl_WrongNumArgs(interp, 1, objv, "?-nocomplain? ?encoding? data");
	return TCL_ERROR;
    }

encConvToOK:
    /*
     * Convert the string to a byte array in 'ds'
     */

    stringPtr = TclGetStringFromObj(data, &length);
    result = Tcl_UtfToExternalDStringEx(encoding, stringPtr, length,
	    flags, &ds);
    if ((flags & TCL_ENCODING_STOPONERROR) && (result != (size_t)-1)) {
	size_t pos = Tcl_NumUtfChars(stringPtr, result);
	int ucs4;
	char buf[TCL_INTEGER_SPACE];
	TclUtfToUCS4(&stringPtr[result], &ucs4);
	sprintf(buf, "%" TCL_Z_MODIFIER "u", result);
	Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected character at index %"
		TCL_Z_MODIFIER "u: 'U+%06X'", pos, ucs4));
	Tcl_SetErrorCode(interp, "TCL", "ENCODING", "ILLEGALSEQUENCE",
		buf, NULL);
	Tcl_DStringFree(&ds);
	return TCL_ERROR;
    }
    Tcl_SetObjResult(interp,
		     Tcl_NewByteArrayObj((unsigned char*) Tcl_DStringValue(&ds),
					 Tcl_DStringLength(&ds)));
    Tcl_DStringFree(&ds);

    /*
     * We're done with the encoding

Changes to generic/tclDecls.h.

1938
1939
1940
1941
1942
1943
1944
1945



1946



1947
1948
1949
1950
1951
1952
1953
EXTERN int		Tcl_UtfCharComplete(const char *src, int length);
/* 655 */
EXTERN const char *	Tcl_UtfNext(const char *src);
/* 656 */
EXTERN const char *	Tcl_UtfPrev(const char *src, const char *start);
/* 657 */
EXTERN int		Tcl_UniCharIsUnicode(int ch);
/* Slot 658 is reserved */



/* Slot 659 is reserved */



/* 660 */
EXTERN int		Tcl_AsyncMarkFromSignal(Tcl_AsyncHandler async,
				int sigNumber);
/* Slot 661 is reserved */
/* Slot 662 is reserved */
/* Slot 663 is reserved */
/* Slot 664 is reserved */







|
>
>
>
|
>
>
>







1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
EXTERN int		Tcl_UtfCharComplete(const char *src, int length);
/* 655 */
EXTERN const char *	Tcl_UtfNext(const char *src);
/* 656 */
EXTERN const char *	Tcl_UtfPrev(const char *src, const char *start);
/* 657 */
EXTERN int		Tcl_UniCharIsUnicode(int ch);
/* 658 */
EXTERN size_t		Tcl_ExternalToUtfDStringEx(Tcl_Encoding encoding,
				const char *src, int srcLen, int flags,
				Tcl_DString *dsPtr);
/* 659 */
EXTERN size_t		Tcl_UtfToExternalDStringEx(Tcl_Encoding encoding,
				const char *src, int srcLen, int flags,
				Tcl_DString *dsPtr);
/* 660 */
EXTERN int		Tcl_AsyncMarkFromSignal(Tcl_AsyncHandler async,
				int sigNumber);
/* Slot 661 is reserved */
/* Slot 662 is reserved */
/* Slot 663 is reserved */
/* Slot 664 is reserved */
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
    char * (*tclGetStringFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 651 */
    Tcl_UniChar * (*tclGetUnicodeFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 652 */
    unsigned char * (*tclGetByteArrayFromObj) (Tcl_Obj *objPtr, size_t *numBytesPtr); /* 653 */
    int (*tcl_UtfCharComplete) (const char *src, int length); /* 654 */
    const char * (*tcl_UtfNext) (const char *src); /* 655 */
    const char * (*tcl_UtfPrev) (const char *src, const char *start); /* 656 */
    int (*tcl_UniCharIsUnicode) (int ch); /* 657 */
    void (*reserved658)(void);
    void (*reserved659)(void);
    int (*tcl_AsyncMarkFromSignal) (Tcl_AsyncHandler async, int sigNumber); /* 660 */
    void (*reserved661)(void);
    void (*reserved662)(void);
    void (*reserved663)(void);
    void (*reserved664)(void);
    void (*reserved665)(void);
    void (*reserved666)(void);







|
|







2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
    char * (*tclGetStringFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 651 */
    Tcl_UniChar * (*tclGetUnicodeFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 652 */
    unsigned char * (*tclGetByteArrayFromObj) (Tcl_Obj *objPtr, size_t *numBytesPtr); /* 653 */
    int (*tcl_UtfCharComplete) (const char *src, int length); /* 654 */
    const char * (*tcl_UtfNext) (const char *src); /* 655 */
    const char * (*tcl_UtfPrev) (const char *src, const char *start); /* 656 */
    int (*tcl_UniCharIsUnicode) (int ch); /* 657 */
    size_t (*tcl_ExternalToUtfDStringEx) (Tcl_Encoding encoding, const char *src, int srcLen, int flags, Tcl_DString *dsPtr); /* 658 */
    size_t (*tcl_UtfToExternalDStringEx) (Tcl_Encoding encoding, const char *src, int srcLen, int flags, Tcl_DString *dsPtr); /* 659 */
    int (*tcl_AsyncMarkFromSignal) (Tcl_AsyncHandler async, int sigNumber); /* 660 */
    void (*reserved661)(void);
    void (*reserved662)(void);
    void (*reserved663)(void);
    void (*reserved664)(void);
    void (*reserved665)(void);
    void (*reserved666)(void);
4002
4003
4004
4005
4006
4007
4008

4009

4010
4011
4012
4013
4014
4015
4016
4017
	(tclStubsPtr->tcl_UtfCharComplete) /* 654 */
#define Tcl_UtfNext \
	(tclStubsPtr->tcl_UtfNext) /* 655 */
#define Tcl_UtfPrev \
	(tclStubsPtr->tcl_UtfPrev) /* 656 */
#define Tcl_UniCharIsUnicode \
	(tclStubsPtr->tcl_UniCharIsUnicode) /* 657 */

/* Slot 658 is reserved */

/* Slot 659 is reserved */
#define Tcl_AsyncMarkFromSignal \
	(tclStubsPtr->tcl_AsyncMarkFromSignal) /* 660 */
/* Slot 661 is reserved */
/* Slot 662 is reserved */
/* Slot 663 is reserved */
/* Slot 664 is reserved */
/* Slot 665 is reserved */







>
|
>
|







4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
	(tclStubsPtr->tcl_UtfCharComplete) /* 654 */
#define Tcl_UtfNext \
	(tclStubsPtr->tcl_UtfNext) /* 655 */
#define Tcl_UtfPrev \
	(tclStubsPtr->tcl_UtfPrev) /* 656 */
#define Tcl_UniCharIsUnicode \
	(tclStubsPtr->tcl_UniCharIsUnicode) /* 657 */
#define Tcl_ExternalToUtfDStringEx \
	(tclStubsPtr->tcl_ExternalToUtfDStringEx) /* 658 */
#define Tcl_UtfToExternalDStringEx \
	(tclStubsPtr->tcl_UtfToExternalDStringEx) /* 659 */
#define Tcl_AsyncMarkFromSignal \
	(tclStubsPtr->tcl_AsyncMarkFromSignal) /* 660 */
/* Slot 661 is reserved */
/* Slot 662 is reserved */
/* Slot 663 is reserved */
/* Slot 664 is reserved */
/* Slot 665 is reserved */

Changes to generic/tclEncoding.c.

511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
 *
 * Side effects:
 *	Depends on the memory, object, and IO subsystems.
 *
 *---------------------------------------------------------------------------
 */

/* Those flags must not conflict with other TCL_ENCODING_* flags in tcl.h */
/* Since TCL_ENCODING_MODIFIED is only used for utf-8/cesu-8 and
 * TCL_ENCODING_LE is only used for  utf-16/utf-32/ucs-2. re-use the same value */
#define TCL_ENCODING_MODIFIED	0x20	/* Converting NULL bytes to 0xC0 0x80 */
#define TCL_ENCODING_LE		TCL_ENCODING_MODIFIED	/* Little-endian encoding */
#define TCL_ENCODING_UTF	0x200	/* For UTF-8 encoding, allow 4-byte output sequences */

void
TclInitEncodingSubsystem(void)
{
    Tcl_EncodingType type;







<


<







511
512
513
514
515
516
517

518
519

520
521
522
523
524
525
526
 *
 * Side effects:
 *	Depends on the memory, object, and IO subsystems.
 *
 *---------------------------------------------------------------------------
 */


/* Since TCL_ENCODING_MODIFIED is only used for utf-8/cesu-8 and
 * TCL_ENCODING_LE is only used for  utf-16/utf-32/ucs-2. re-use the same value */

#define TCL_ENCODING_LE		TCL_ENCODING_MODIFIED	/* Little-endian encoding */
#define TCL_ENCODING_UTF	0x200	/* For UTF-8 encoding, allow 4-byte output sequences */

void
TclInitEncodingSubsystem(void)
{
    Tcl_EncodingType type;
1140
1141
1142
1143
1144
1145
1146













































1147
1148
1149
1150

1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
				 * for the default system encoding. */
    const char *src,		/* Source string in specified encoding. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * encoding-specific string length. */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{













































    char *dst;
    Tcl_EncodingState state;
    const Encoding *encodingPtr;
    int flags, dstLen, result, soFar, srcRead, dstWrote, dstChars;


    Tcl_DStringInit(dstPtr);
    dst = Tcl_DStringValue(dstPtr);
    dstLen = dstPtr->spaceAvl - 1;

    if (encoding == NULL) {
	encoding = systemEncoding;
    }
    encodingPtr = (Encoding *) encoding;

    if (src == NULL) {
	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = encodingPtr->lengthProc(src);
    }

    flags = TCL_ENCODING_START | TCL_ENCODING_END;
    if (encodingPtr->toUtfProc == UtfToUtfProc) {
	flags |= TCL_ENCODING_MODIFIED | TCL_ENCODING_UTF;
    }

    while (1) {
	result = encodingPtr->toUtfProc(encodingPtr->clientData, src, srcLen,
		flags, &state, dst, dstLen, &srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    Tcl_DStringSetLength(dstPtr, soFar);
	    return Tcl_DStringValue(dstPtr);
	}
	flags &= ~TCL_ENCODING_START;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}
	Tcl_DStringSetLength(dstPtr, 2 * Tcl_DStringLength(dstPtr) + 1);







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



|
>
















|












|







1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
				 * for the default system encoding. */
    const char *src,		/* Source string in specified encoding. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * encoding-specific string length. */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{
    Tcl_ExternalToUtfDStringEx(encoding, src, srcLen, 0, dstPtr);
    return Tcl_DStringValue(dstPtr);
}


/*
 *-------------------------------------------------------------------------
 *
 * Tcl_ExternalToUtfDStringEx --
 *
 *	Convert a source buffer from the specified encoding into UTF-8.
*	The parameter flags controls the behavior, if any of the bytes in
 *	the source buffer are invalid or cannot be represented in utf-8.
 *	Possible flags values:
 *	TCL_ENCODING_STOPONERROR: don't replace invalid characters/bytes but
 *	return the first error position (Default in Tcl 9.0).
 *	TCL_ENCODING_NOCOMPLAIN: replace invalid characters/bytes by a default
 *	fallback character. Always return -1 (Default in Tcl 8.7).
 *	TCL_ENCODING_MODIFIED: convert NULL bytes to \xC0\x80 in stead of 0x00.
 *	Only valid for "utf-8" and "cesu-8". This flag may be used together
 *	with the other flags.
 *
 * Results:
 *	The converted bytes are stored in the DString, which is then NULL
 *	terminated in an encoding-specific manner. The return value is
 *	the error position in the source string or -1 if no conversion error
 *	is reported.
  *
 * Side effects:
 *	None.
 *
 *-------------------------------------------------------------------------
 */

size_t
Tcl_ExternalToUtfDStringEx(
    Tcl_Encoding encoding,	/* The encoding for the source string, or NULL
				 * for the default system encoding. */
    const char *src,		/* Source string in specified encoding. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * encoding-specific string length. */
    int flags,			/* Conversion control flags. */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{
    char *dst;
    Tcl_EncodingState state;
    const Encoding *encodingPtr;
    int dstLen, result, soFar, srcRead, dstWrote, dstChars;
    const char *srcStart = src;

    Tcl_DStringInit(dstPtr);
    dst = Tcl_DStringValue(dstPtr);
    dstLen = dstPtr->spaceAvl - 1;

    if (encoding == NULL) {
	encoding = systemEncoding;
    }
    encodingPtr = (Encoding *) encoding;

    if (src == NULL) {
	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = encodingPtr->lengthProc(src);
    }

    flags |= TCL_ENCODING_START | TCL_ENCODING_END;
    if (encodingPtr->toUtfProc == UtfToUtfProc) {
	flags |= TCL_ENCODING_MODIFIED | TCL_ENCODING_UTF;
    }

    while (1) {
	result = encodingPtr->toUtfProc(encodingPtr->clientData, src, srcLen,
		flags, &state, dst, dstLen, &srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    Tcl_DStringSetLength(dstPtr, soFar);
	    return (result == TCL_OK) ? (size_t)-1 : (size_t)(src - srcStart);
	}
	flags &= ~TCL_ENCODING_START;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}
	Tcl_DStringSetLength(dstPtr, 2 * Tcl_DStringLength(dstPtr) + 1);
1332
1333
1334
1335
1336
1337
1338














































1339
1340
1341
1342

1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
				 * NULL for the default system encoding. */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * strlen(). */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{














































    char *dst;
    Tcl_EncodingState state;
    const Encoding *encodingPtr;
    int flags, dstLen, result, soFar, srcRead, dstWrote, dstChars;


    Tcl_DStringInit(dstPtr);
    dst = Tcl_DStringValue(dstPtr);
    dstLen = dstPtr->spaceAvl - 1;

    if (encoding == NULL) {
	encoding = systemEncoding;
    }
    encodingPtr = (Encoding *) encoding;

    if (src == NULL) {
	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = strlen(src);
    }
    flags = TCL_ENCODING_START | TCL_ENCODING_END;
    while (1) {
	result = encodingPtr->fromUtfProc(encodingPtr->clientData, src,
		srcLen, flags, &state, dst, dstLen,
		&srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    int i = soFar + encodingPtr->nullSize - 1;
	    while (i >= soFar) {
		Tcl_DStringSetLength(dstPtr, i--);
	    }
	    return Tcl_DStringValue(dstPtr);
	}

	flags &= ~TCL_ENCODING_START;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



|
>















|












|







1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
				 * NULL for the default system encoding. */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * strlen(). */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{
    Tcl_UtfToExternalDStringEx(encoding, src, srcLen, 0, dstPtr);
    return Tcl_DStringValue(dstPtr);
}


/*
 *-------------------------------------------------------------------------
 *
 * Tcl_UtfToExternalDStringEx --
 *
 *	Convert a source buffer from UTF-8 to the specified encoding.
 *	The parameter flags controls the behavior, if any of the bytes in
 *	the source buffer are invalid or cannot be represented in the
 *	target encoding.
 *	Possible flags values:
 *	TCL_ENCODING_STOPONERROR: don't replace invalid characters/bytes but
 *	return the first error position (Default in Tcl 9.0).
 *	TCL_ENCODING_NOCOMPLAIN: replace invalid characters/bytes by a default
 *	fallback character. Always return -1 (Default in Tcl 8.7).
 *	TCL_ENCODING_MODIFIED: convert NULL bytes to \xC0\x80 in stead of 0x00.
 *	Only valid for "utf-8" and "cesu-8". This flag may be used together
 *	with the other flags.
 *
 * Results:
 *	The converted bytes are stored in the DString, which is then NULL
 *	terminated in an encoding-specific manner. The return value is
 *	the error position in the source string or -1 if no conversion error
 *	is reported.
 *
 * Side effects:
 *	None.
 *
 *-------------------------------------------------------------------------
 */

size_t
Tcl_UtfToExternalDStringEx(
    Tcl_Encoding encoding,	/* The encoding for the converted string, or
				 * NULL for the default system encoding. */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * strlen(). */
    int flags,	/* Conversion control flags. */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{
    char *dst;
    Tcl_EncodingState state;
    const Encoding *encodingPtr;
    int dstLen, result, soFar, srcRead, dstWrote, dstChars;
    const char *srcStart = src;

    Tcl_DStringInit(dstPtr);
    dst = Tcl_DStringValue(dstPtr);
    dstLen = dstPtr->spaceAvl - 1;

    if (encoding == NULL) {
	encoding = systemEncoding;
    }
    encodingPtr = (Encoding *) encoding;

    if (src == NULL) {
	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = strlen(src);
    }
    flags |= TCL_ENCODING_START | TCL_ENCODING_END;
    while (1) {
	result = encodingPtr->fromUtfProc(encodingPtr->clientData, src,
		srcLen, flags, &state, dst, dstLen,
		&srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    int i = soFar + encodingPtr->nullSize - 1;
	    while (i >= soFar) {
		Tcl_DStringSetLength(dstPtr, i--);
	    }
	    return (result == TCL_OK) ? (size_t)-1 : (size_t)(src - srcStart);
	}

	flags &= ~TCL_ENCODING_START;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}
2191
2192
2193
2194
2195
2196
2197






2198
2199
2200
2201
2202
2203
2204
 *	Returns TCL_OK if conversion was successful.
 *
 * Side effects:
 *	None.
 *
 *-------------------------------------------------------------------------
 */







static int
UtfToUtfProc(
    ClientData clientData,	/* additional flags, e.g. TCL_ENCODING_MODIFIED */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes. */
    int flags,			/* Conversion control flags. */







>
>
>
>
>
>







2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
 *	Returns TCL_OK if conversion was successful.
 *
 * Side effects:
 *	None.
 *
 *-------------------------------------------------------------------------
 */

#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
#   define STOPONERROR !(flags & TCL_ENCODING_NOCOMPLAIN)
#else
#   define STOPONERROR (flags & TCL_ENCODING_STOPONERROR)
#endif

static int
UtfToUtfProc(
    ClientData clientData,	/* additional flags, e.g. TCL_ENCODING_MODIFIED */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes. */
    int flags,			/* Conversion control flags. */
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
	     * Always check before using TclUtfToUCS4. Not doing can so
	     * cause it run beyond the end of the buffer! If we happen such an
	     * incomplete char its bytes are made to represent themselves
	     * unless the user has explicitly asked to be told.
	     */

	    if (flags & TCL_ENCODING_MODIFIED) {
		if (flags & TCL_ENCODING_STOPONERROR) {
		    result = TCL_CONVERT_MULTIBYTE;
		    break;
		}
		ch = UCHAR(*src++);
	    } else {
		char chbuf[2];
		chbuf[0] = UCHAR(*src++); chbuf[1] = 0;
		TclUtfToUCS4(chbuf, &ch);
	    }
	    dst += Tcl_UniCharToUtf(ch, dst);
	} else {
	    int low;
	    const char *saveSrc = src;
	    size_t len = TclUtfToUCS4(src, &ch);
	    if ((len < 2) && (ch != 0) && (flags & TCL_ENCODING_STOPONERROR)
		    && (flags & TCL_ENCODING_MODIFIED)) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    src += len;
	    if (!(flags & TCL_ENCODING_UTF) && (ch > 0x3FF)) {
		if (ch > 0xFFFF) {







|














|







2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
	     * Always check before using TclUtfToUCS4. Not doing can so
	     * cause it run beyond the end of the buffer! If we happen such an
	     * incomplete char its bytes are made to represent themselves
	     * unless the user has explicitly asked to be told.
	     */

	    if (flags & TCL_ENCODING_MODIFIED) {
		if (STOPONERROR) {
		    result = TCL_CONVERT_MULTIBYTE;
		    break;
		}
		ch = UCHAR(*src++);
	    } else {
		char chbuf[2];
		chbuf[0] = UCHAR(*src++); chbuf[1] = 0;
		TclUtfToUCS4(chbuf, &ch);
	    }
	    dst += Tcl_UniCharToUtf(ch, dst);
	} else {
	    int low;
	    const char *saveSrc = src;
	    size_t len = TclUtfToUCS4(src, &ch);
	    if ((len < 2) && (ch != 0) && STOPONERROR
		    && (flags & TCL_ENCODING_MODIFIED)) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    src += len;
	    if (!(flags & TCL_ENCODING_UTF) && (ch > 0x3FF)) {
		if (ch > 0xFFFF) {
2314
2315
2316
2317
2318
2319
2320

2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
		 * A surrogate character is detected, handle especially.
		 */

		low = ch;
		len = (src <= srcEnd-3) ? TclUtfToUCS4(src, &low) : 0;

		if (((low & ~0x3FF) != 0xDC00) || (ch & 0x400)) {

		    if (flags & TCL_ENCODING_STOPONERROR) {
			result = TCL_CONVERT_UNKNOWN;
			src = saveSrc;
			break;
		    }
		cesu8:
		    *dst++ = (char) (((ch >> 12) | 0xE0) & 0xEF);
		    *dst++ = (char) (((ch >> 6) | 0x80) & 0xBF);
		    *dst++ = (char) ((ch | 0x80) & 0xBF);
		    continue;
		}
		src += len;
		dst += Tcl_UniCharToUtf(ch, dst);
		ch = low;
	    } else if (!Tcl_UniCharIsUnicode(ch)) {
		if (flags & TCL_ENCODING_STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    src = saveSrc;
		    break;
		}
		if (!(flags & TCL_ENCODING_MODIFIED)) {
		    ch = 0xFFFD;
		}







>
|














|







2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
		 * A surrogate character is detected, handle especially.
		 */

		low = ch;
		len = (src <= srcEnd-3) ? TclUtfToUCS4(src, &low) : 0;

		if (((low & ~0x3FF) != 0xDC00) || (ch & 0x400)) {

		    if (STOPONERROR) {
			result = TCL_CONVERT_UNKNOWN;
			src = saveSrc;
			break;
		    }
		cesu8:
		    *dst++ = (char) (((ch >> 12) | 0xE0) & 0xEF);
		    *dst++ = (char) (((ch >> 6) | 0x80) & 0xBF);
		    *dst++ = (char) ((ch | 0x80) & 0xBF);
		    continue;
		}
		src += len;
		dst += Tcl_UniCharToUtf(ch, dst);
		ch = low;
	    } else if (!Tcl_UniCharIsUnicode(ch)) {
		if (STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    src = saveSrc;
		    break;
		}
		if (!(flags & TCL_ENCODING_MODIFIED)) {
		    ch = 0xFFFD;
		}
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
	}
	len = TclUtfToUCS4(src, &ch);
	if (!Tcl_UniCharIsUnicode(ch)) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    ch = 0xFFFD;
	}
	src += len;
	if (flags & TCL_ENCODING_LE) {







|







2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
	}
	len = TclUtfToUCS4(src, &ch);
	if (!Tcl_UniCharIsUnicode(ch)) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    ch = 0xFFFD;
	}
	src += len;
	if (flags & TCL_ENCODING_LE) {
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
	}
	len = TclUtfToUCS4(src, &ch);
	if (!Tcl_UniCharIsUnicode(ch)) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    ch = 0xFFFD;
	}
	src += len;
	if (flags & TCL_ENCODING_LE) {







|







2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
	}
	len = TclUtfToUCS4(src, &ch);
	if (!Tcl_UniCharIsUnicode(ch)) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    ch = 0xFFFD;
	}
	src += len;
	if (flags & TCL_ENCODING_LE) {
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
		break;
	    }
	    ch = toUnicode[byte][*((unsigned char *) src)];
	} else {
	    ch = pageZero[byte];
	}
	if ((ch == 0) && (byte != 0)) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    if (prefixBytes[byte]) {
		src--;
	    }
	    ch = (Tcl_UniChar) byte;







|







3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
		break;
	    }
	    ch = toUnicode[byte][*((unsigned char *) src)];
	} else {
	    ch = pageZero[byte];
	}
	if ((ch == 0) && (byte != 0)) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    if (prefixBytes[byte]) {
		src--;
	    }
	    ch = (Tcl_UniChar) byte;
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
	if (!len) {
	    word = 0;
	} else
#endif
	    word = fromUnicode[(ch >> 8)][ch & 0xFF];

	if ((word == 0) && (ch != 0)) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    word = dataPtr->fallback;
	}
	if (prefixBytes[(word >> 8)] != 0) {
	    if (dst + 1 > dstEnd) {







|







3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
	if (!len) {
	    word = 0;
	} else
#endif
	    word = fromUnicode[(ch >> 8)][ch & 0xFF];

	if ((word == 0) && (ch != 0)) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    word = dataPtr->fallback;
	}
	if (prefixBytes[(word >> 8)] != 0) {
	    if (dst + 1 > dstEnd) {
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
	 */

	if (ch > 0xFF
#if TCL_UTF_MAX < 4
		|| ((ch >= 0xD800) && (len < 3))
#endif
		) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
#if TCL_UTF_MAX < 4
	    if ((ch >= 0xD800) && (len < 3)) {
		len = 4;
	    }







|







3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
	 */

	if (ch > 0xFF
#if TCL_UTF_MAX < 4
		|| ((ch >= 0xD800) && (len < 3))
#endif
		) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
#if TCL_UTF_MAX < 4
	    if ((ch >= 0xD800) && (len < 3)) {
		len = 4;
	    }
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
	     * We have a split-up or unrecognized escape sequence. If we
	     * checked all the sequences, then it's a syntax error, otherwise
	     * we need more bytes to determine a match.
	     */

	    if ((checked == dataPtr->numSubTables + 2)
		    || (flags & TCL_ENCODING_END)) {
		if ((flags & TCL_ENCODING_STOPONERROR) == 0) {
		    /*
		     * Skip the unknown escape sequence.
		     */

		    src += longest;
		    continue;
		}







|







3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
	     * We have a split-up or unrecognized escape sequence. If we
	     * checked all the sequences, then it's a syntax error, otherwise
	     * we need more bytes to determine a match.
	     */

	    if ((checked == dataPtr->numSubTables + 2)
		    || (flags & TCL_ENCODING_END)) {
		if (!STOPONERROR) {
		    /*
		     * Skip the unknown escape sequence.
		     */

		    src += longest;
		    continue;
		}
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
		if (word != 0) {
		    break;
		}
	    }

	    if (word == 0) {
		state = oldState;
		if (flags & TCL_ENCODING_STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    break;
		}
		encodingPtr = GetTableEncoding(dataPtr, state);
		tableDataPtr = (const TableEncodingData *)encodingPtr->clientData;
		word = tableDataPtr->fallback;
	    }







|







3742
3743
3744
3745
3746
3747
3748
3749
3750
3751
3752
3753
3754
3755
3756
		if (word != 0) {
		    break;
		}
	    }

	    if (word == 0) {
		state = oldState;
		if (STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    break;
		}
		encodingPtr = GetTableEncoding(dataPtr, state);
		tableDataPtr = (const TableEncodingData *)encodingPtr->clientData;
		word = tableDataPtr->fallback;
	    }

Changes to generic/tclStubInit.c.

1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
    TclGetStringFromObj, /* 651 */
    TclGetUnicodeFromObj, /* 652 */
    TclGetByteArrayFromObj, /* 653 */
    Tcl_UtfCharComplete, /* 654 */
    Tcl_UtfNext, /* 655 */
    Tcl_UtfPrev, /* 656 */
    Tcl_UniCharIsUnicode, /* 657 */
    0, /* 658 */
    0, /* 659 */
    Tcl_AsyncMarkFromSignal, /* 660 */
    0, /* 661 */
    0, /* 662 */
    0, /* 663 */
    0, /* 664 */
    0, /* 665 */
    0, /* 666 */







|
|







1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
    TclGetStringFromObj, /* 651 */
    TclGetUnicodeFromObj, /* 652 */
    TclGetByteArrayFromObj, /* 653 */
    Tcl_UtfCharComplete, /* 654 */
    Tcl_UtfNext, /* 655 */
    Tcl_UtfPrev, /* 656 */
    Tcl_UniCharIsUnicode, /* 657 */
    Tcl_ExternalToUtfDStringEx, /* 658 */
    Tcl_UtfToExternalDStringEx, /* 659 */
    Tcl_AsyncMarkFromSignal, /* 660 */
    0, /* 661 */
    0, /* 662 */
    0, /* 663 */
    0, /* 664 */
    0, /* 665 */
    0, /* 666 */

Changes to tests/chanio.test.

245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
    set f [open $path(test1) w]
    chan configure $f -encoding ascii -buffering line -translation crlf
    chan puts -nonewline $f "\n12"
    contents $path(test1)
} -cleanup {
    chan close $f
} -result "\r\n12"
test chan-io-3.4 {WriteChars: loop over stage buffer} {
    # stage buffer maps to more than can be queued at once.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 16
    chan puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
    chan close $f
    lappend x [contents $path(test1)]
} [list "!)!)!)!)!)!)!)!)" "!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)"]
test chan-io-3.5 {WriteChars: saved != 0} {
    # Bytes produced by UtfToExternal from end of last channel buffer had to
    # be moved to beginning of next channel buffer to preserve requested
    # buffersize.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 17
    chan puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]







|








|







245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
    set f [open $path(test1) w]
    chan configure $f -encoding ascii -buffering line -translation crlf
    chan puts -nonewline $f "\n12"
    contents $path(test1)
} -cleanup {
    chan close $f
} -result "\r\n12"
test chan-io-3.4 {WriteChars: loop over stage buffer} deprecated {
    # stage buffer maps to more than can be queued at once.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 16
    chan puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
    chan close $f
    lappend x [contents $path(test1)]
} [list "!)!)!)!)!)!)!)!)" "!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)"]
test chan-io-3.5 {WriteChars: saved != 0} deprecated {
    # Bytes produced by UtfToExternal from end of last channel buffer had to
    # be moved to beginning of next channel buffer to preserve requested
    # buffersize.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 17
    chan puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
    set f [open $path(test1) w]
    chan configure $f -encoding shiftjis -buffersize 16
    chan puts -nonewline $f "12345678901234AB"
    set x [list [contents $path(test1)]]
    chan close $f
    lappend x [contents $path(test1)]
} [list "12345678901234\x82\x60" "12345678901234\x82\x60\x82\x61"]
test chan-io-3.7 {WriteChars: (bufPtr->nextAdded > bufPtr->length)} {
    # When translating UTF-8 to external, the produced bytes went past end of
    # the channel buffer. This is done on purpose - we then truncate the bytes
    # at the end of the partial character to preserve the requested blocksize
    # on flush. The truncated bytes are moved to the beginning of the next
    # channel buffer.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 17







|







281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
    set f [open $path(test1) w]
    chan configure $f -encoding shiftjis -buffersize 16
    chan puts -nonewline $f "12345678901234AB"
    set x [list [contents $path(test1)]]
    chan close $f
    lappend x [contents $path(test1)]
} [list "12345678901234\x82\x60" "12345678901234\x82\x60\x82\x61"]
test chan-io-3.7 {WriteChars: (bufPtr->nextAdded > bufPtr->length)} deprecated {
    # When translating UTF-8 to external, the produced bytes went past end of
    # the channel buffer. This is done on purpose - we then truncate the bytes
    # at the end of the partial character to preserve the requested blocksize
    # on flush. The truncated bytes are moved to the beginning of the next
    # channel buffer.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 17

Changes to tests/cmdAH.test.

174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
    encoding
} -result {wrong # args: should be "encoding subcommand ?arg ...?"}
test cmdAH-4.2 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding foo
} -result {unknown or ambiguous subcommand "foo": must be convertfrom, convertto, dirs, names, or system}
test cmdAH-4.3 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto
} -result {wrong # args: should be "encoding convertto ?encoding? data"}
test cmdAH-4.4 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.5 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208







|







174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
    encoding
} -result {wrong # args: should be "encoding subcommand ?arg ...?"}
test cmdAH-4.2 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding foo
} -result {unknown or ambiguous subcommand "foo": must be convertfrom, convertto, dirs, names, or system}
test cmdAH-4.3 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?encoding? data"}
test cmdAH-4.4 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.5 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
    encoding system iso8859-1
    encoding convertto jis0208 乎
} -cleanup {
    encoding system $system
} -result 8C
test cmdAH-4.7 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom
} -result {wrong # args: should be "encoding convertfrom ?encoding? data"}
test cmdAH-4.8 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.9 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208







|







196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
    encoding system iso8859-1
    encoding convertto jis0208 乎
} -cleanup {
    encoding system $system
} -result 8C
test cmdAH-4.7 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?encoding? data"}
test cmdAH-4.8 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.9 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208

Changes to tests/encoding.test.

18
19
20
21
22
23
24


25
26
27
28
29
30
31
    variable x

catch {
    ::tcltest::loadTestedCommands
    package require -exact tcl::test [info patchlevel]
}



proc toutf {args} {
    variable x
    lappend x "toutf $args"
}
proc fromutf {args} {
    variable x
    lappend x "fromutf $args"







>
>







18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
    variable x

catch {
    ::tcltest::loadTestedCommands
    package require -exact tcl::test [info patchlevel]
}

testConstraint deprecated [expr {![info exists tcl_precision]}]

proc toutf {args} {
    variable x
    lappend x "toutf $args"
}
proc fromutf {args} {
    variable x
    lappend x "fromutf $args"
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
test encoding-11.11 {encoding: extended Unicode UTF-32} {
    viewable [encoding convertto utf-32be 😹]
} "\x00\x01\xF69 (\\u0000\\u0001\\u00F69)"
# OpenEncodingFile is fully tested by the rest of the tests in this file.

test encoding-12.1 {LoadTableEncoding: normal encoding} {
    set x [encoding convertto iso8859-3 Ġ]
    append x [encoding convertto iso8859-3 Õ]
    append x [encoding convertfrom iso8859-3 Õ]
} "Õ?Ġ"
test encoding-12.2 {LoadTableEncoding: single-byte encoding} {
    set x [encoding convertto iso8859-3 abĠg]
    append x [encoding convertfrom iso8859-3 abÕg]
} "abÕgabĠg"
test encoding-12.3 {LoadTableEncoding: multi-byte encoding} {







|







295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
test encoding-11.11 {encoding: extended Unicode UTF-32} {
    viewable [encoding convertto utf-32be 😹]
} "\x00\x01\xF69 (\\u0000\\u0001\\u00F69)"
# OpenEncodingFile is fully tested by the rest of the tests in this file.

test encoding-12.1 {LoadTableEncoding: normal encoding} {
    set x [encoding convertto iso8859-3 Ġ]
    append x [encoding convertto -nocomplain iso8859-3 Õ]
    append x [encoding convertfrom iso8859-3 Õ]
} "Õ?Ġ"
test encoding-12.2 {LoadTableEncoding: single-byte encoding} {
    set x [encoding convertto iso8859-3 abĠg]
    append x [encoding convertfrom iso8859-3 abÕg]
} "abÕgabĠg"
test encoding-12.3 {LoadTableEncoding: multi-byte encoding} {
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
test encoding-15.5 {UtfToUtfProc emoji character input} {
    set x \xF0\x9F\x98\x82
    set y [encoding convertfrom utf-8 \xF0\x9F\x98\x82]
    list [string length $x] $y
} "4 😂"
test encoding-15.6 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uDE02\uD83D
    set y [encoding convertto utf-8 \uDE02\uD83D\uDE02\uD83D]
    binary scan $y H* z
    list [string length $y] $z
} {10 edb882f09f9882eda0bd}
test encoding-15.7 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uD83D
    set y [encoding convertto utf-8 \uDE02\uD83D\uD83D]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 9 edb882eda0bdeda0bd}
test encoding-15.8 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83Dé
    set y [encoding convertto utf-8 \uDE02\uD83Dé]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 8 edb882eda0bdc3a9}
test encoding-15.9 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83DX
    set y [encoding convertto utf-8 \uDE02\uD83DX]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 7 edb882eda0bd58}
test encoding-15.10 {UtfToUtfProc high surrogate character output} {
    set x \uDE02é
    set y [encoding convertto utf-8 \uDE02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 edb882c3a9}
test encoding-15.11 {UtfToUtfProc low surrogate character output} {
    set x \uDA02é
    set y [encoding convertto utf-8 \uDA02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 eda882c3a9}
test encoding-15.12 {UtfToUtfProc high surrogate character output} {
    set x \uDE02Y
    set y [encoding convertto utf-8 \uDE02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 edb88259}
test encoding-15.13 {UtfToUtfProc low surrogate character output} {
    set x \uDA02Y
    set y [encoding convertto utf-8 \uDA02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 eda88259}
test encoding-15.14 {UtfToUtfProc high surrogate character output} {
    set x \uDE02
    set y [encoding convertto utf-8 \uDE02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 edb882}
test encoding-15.15 {UtfToUtfProc low surrogate character output} {
    set x \uDA02
    set y [encoding convertto utf-8 \uDA02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 eda882}
test encoding-15.16 {UtfToUtfProc: Invalid 4-byte UTF-8, see [ed29806ba]} {
    set x \xF0\xA0\xA1\xC2
    set y [encoding convertfrom utf-8 \xF0\xA0\xA1\xC2]
    list [string length $x] $y
} "4 \xF0\xA0\xA1\xC2"
test encoding-15.17 {UtfToUtfProc emoji character output} {
    set x 😂
    set y [encoding convertto utf-8 😂]
    binary scan $y H* z
    list [string length $y] $z







|





|





|





|





|





|





|





|





|





|





|







344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
test encoding-15.5 {UtfToUtfProc emoji character input} {
    set x \xF0\x9F\x98\x82
    set y [encoding convertfrom utf-8 \xF0\x9F\x98\x82]
    list [string length $x] $y
} "4 😂"
test encoding-15.6 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uDE02\uD83D
    set y [encoding convertto -nocomplain utf-8 \uDE02\uD83D\uDE02\uD83D]
    binary scan $y H* z
    list [string length $y] $z
} {10 edb882f09f9882eda0bd}
test encoding-15.7 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uD83D
    set y [encoding convertto -nocomplain utf-8 \uDE02\uD83D\uD83D]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 9 edb882eda0bdeda0bd}
test encoding-15.8 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83Dé
    set y [encoding convertto -nocomplain utf-8 \uDE02\uD83Dé]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 8 edb882eda0bdc3a9}
test encoding-15.9 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83DX
    set y [encoding convertto -nocomplain utf-8 \uDE02\uD83DX]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 7 edb882eda0bd58}
test encoding-15.10 {UtfToUtfProc high surrogate character output} {
    set x \uDE02é
    set y [encoding convertto -nocomplain utf-8 \uDE02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 edb882c3a9}
test encoding-15.11 {UtfToUtfProc low surrogate character output} {
    set x \uDA02é
    set y [encoding convertto -nocomplain utf-8 \uDA02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 eda882c3a9}
test encoding-15.12 {UtfToUtfProc high surrogate character output} {
    set x \uDE02Y
    set y [encoding convertto -nocomplain utf-8 \uDE02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 edb88259}
test encoding-15.13 {UtfToUtfProc low surrogate character output} {
    set x \uDA02Y
    set y [encoding convertto -nocomplain utf-8 \uDA02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 eda88259}
test encoding-15.14 {UtfToUtfProc high surrogate character output} {
    set x \uDE02
    set y [encoding convertto -nocomplain utf-8 \uDE02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 edb882}
test encoding-15.15 {UtfToUtfProc low surrogate character output} {
    set x \uDA02
    set y [encoding convertto -nocomplain utf-8 \uDA02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 eda882}
test encoding-15.16 {UtfToUtfProc: Invalid 4-byte UTF-8, see [ed29806ba]} {
    set x \xF0\xA0\xA1\xC2
    set y [encoding convertfrom -nocomplain utf-8 \xF0\xA0\xA1\xC2]
    list [string length $x] $y
} "4 \xF0\xA0\xA1\xC2"
test encoding-15.17 {UtfToUtfProc emoji character output} {
    set x 😂
    set y [encoding convertto utf-8 😂]
    binary scan $y H* z
    list [string length $y] $z
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
test encoding-17.1 {UtfToUtf16Proc} -body {
    encoding convertto utf-16 "\U460DC"
} -result "\xD8\xD8\xDC\xDC"
test encoding-17.2 {UtfToUcs2Proc} -body {
    encoding convertfrom utf-16 [encoding convertto ucs-2 "\U460DC"]
} -result "\uFFFD"
test encoding-17.3 {UtfToUtf16Proc} -body {
    encoding convertto utf-16be "\uDCDC"
} -result "\xFF\xFD"
test encoding-17.4 {UtfToUtf16Proc} -body {
    encoding convertto utf-16le "\uD8D8"
} -result "\xFD\xFF"
test encoding-17.5 {UtfToUtf16Proc} -body {
    encoding convertto utf-32le "\U460DC"
} -result "\xDC\x60\x04\x00"
test encoding-17.6 {UtfToUtf16Proc} -body {
    encoding convertto utf-32be "\U460DC"
} -result "\x00\x04\x60\xDC"







|


|







485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
test encoding-17.1 {UtfToUtf16Proc} -body {
    encoding convertto utf-16 "\U460DC"
} -result "\xD8\xD8\xDC\xDC"
test encoding-17.2 {UtfToUcs2Proc} -body {
    encoding convertfrom utf-16 [encoding convertto ucs-2 "\U460DC"]
} -result "\uFFFD"
test encoding-17.3 {UtfToUtf16Proc} -body {
    encoding convertto -nocomplain utf-16be "\uDCDC"
} -result "\xFF\xFD"
test encoding-17.4 {UtfToUtf16Proc} -body {
    encoding convertto -nocomplain utf-16le "\uD8D8"
} -result "\xFD\xFF"
test encoding-17.5 {UtfToUtf16Proc} -body {
    encoding convertto utf-32le "\U460DC"
} -result "\xDC\x60\x04\x00"
test encoding-17.6 {UtfToUtf16Proc} -body {
    encoding convertto utf-32be "\U460DC"
} -result "\x00\x04\x60\xDC"
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635









636
637



























638
639
640
641
642
643
644
    list $count [viewable $line]
} [list 3 "乎乞也 (\\u4E4E\\u4E5E\\u4E5F)"]

test encoding-24.4 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC0\x80"]
} 1
test encoding-24.5 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC0\x81"]
} 2
test encoding-24.6 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC1\xBF"]
} 2
test encoding-24.7 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC2\x80"]
} 1
test encoding-24.8 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xE0\x80\x80"]
} 3
test encoding-24.9 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xE0\x9F\xBF"]
} 3
test encoding-24.10 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xE0\xA0\x80"]
} 1
test encoding-24.11 {Parse valid or invalid utf-8} {









    string length [encoding convertfrom utf-8 "\xEF\xBF\xBF"]
} 1




























file delete [file join [temporaryDirectory] iso2022.txt]

#
# Begin jajp encoding round-trip conformity tests
#
proc foreach-jisx0208 {varName command} {







|


|





|


|





>
>
>
>
>
>
>
>
>
|

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
    list $count [viewable $line]
} [list 3 "乎乞也 (\\u4E4E\\u4E5E\\u4E5F)"]

test encoding-24.4 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC0\x80"]
} 1
test encoding-24.5 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nocomplain utf-8 "\xC0\x81"]
} 2
test encoding-24.6 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nocomplain utf-8 "\xC1\xBF"]
} 2
test encoding-24.7 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC2\x80"]
} 1
test encoding-24.8 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nocomplain utf-8 "\xE0\x80\x80"]
} 3
test encoding-24.9 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nocomplain utf-8 "\xE0\x9F\xBF"]
} 3
test encoding-24.10 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xE0\xA0\x80"]
} 1
test encoding-24.11 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nocomplain utf-8 "\xEF\xBF\xBF"]
} 1
test encoding-24.12 {Parse valid or invalid utf-8} -constraints deprecated -body {
    encoding convertfrom utf-8 "\xC0\x81"
} -returnCodes 1 -result {unexpected byte sequence starting at index 0: '\xC0'}
test encoding-24.13 {Parse valid or invalid utf-8} -constraints deprecated -body {
    encoding convertfrom utf-8 "\xC1\xBF"
} -returnCodes 1 -result {unexpected byte sequence starting at index 0: '\xC1'}
test encoding-24.14 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC2\x80"]
} 1
test encoding-24.15 {Parse valid or invalid utf-8} -constraints deprecated -body {
    encoding convertfrom utf-8 "Z\xE0\x80"
} -returnCodes 1 -result {unexpected byte sequence starting at index 1: '\xE0'}
test encoding-24.16 {Parse valid or invalid utf-8} -constraints {testbytestring deprecated} -body {
    encoding convertto utf-8 [testbytestring "Z\u4343\x80"]
} -returnCodes 1 -result {expected byte sequence but character 1 was '䍃€' (U+004343)}
test encoding-24.17 {Parse valid or invalid utf-8} -constraints {testbytestring deprecated} -body {
    encoding convertto utf-8 [testbytestring "Z\xE0\x80"]
} -result "Z\xC3\xA0\xE2\x82\xAC"
test encoding-24.18 {Parse valid or invalid utf-8} -constraints {testbytestring deprecated} -body {
    encoding convertto utf-8 [testbytestring "Z\xE0\x80xxxxxx"]
} -result "Z\xC3\xA0\xE2\x82\xACxxxxxx"
test encoding-24.19 {Parse valid or invalid utf-8} -constraints deprecated -body {
    encoding convertto utf-8 "ZX\uD800"
} -returnCodes 1 -match glob -result "unexpected character at index 2: 'U+00D800'"
test encoding-24.20 {Parse with -nocomplain but without providing encoding} {
    string length [encoding convertfrom -nocomplain "\x20"]
} 1
test encoding-24.21 {Parse with -nocomplain but without providing encoding} {
    string length [encoding convertto -nocomplain "\x20"]
} 1
test encoding-24.22 {Syntax error, two encodings} -body {
    encoding convertfrom iso8859-1 utf-8 "ZX\uD800"
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertfrom ?-nocomplain? ?encoding? data"}
test encoding-24.23 {Syntax error, two encodings} -body {
    encoding convertto iso8859-1 utf-8 "ZX\uD800"
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertto ?-nocomplain? ?encoding? data"}

file delete [file join [temporaryDirectory] iso2022.txt]

#
# Begin jajp encoding round-trip conformity tests
#
proc foreach-jisx0208 {varName command} {
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
}


test encoding-28.0 {all encodings load} -body {
	set string hello
	foreach name [encoding names] {
		incr count
		encoding convertto $name $string

		# discard the cached internal representation of Tcl_Encoding
		# Unfortunately, without this, encoding 2-1 fails.
		llength $name
	}
	return $count
} -result [expr {[info exists ::tcl_precision] ? 92 : 91}]







|







824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
}


test encoding-28.0 {all encodings load} -body {
	set string hello
	foreach name [encoding names] {
		incr count
		encoding convertto -nocomplain $name $string

		# discard the cached internal representation of Tcl_Encoding
		# Unfortunately, without this, encoding 2-1 fails.
		llength $name
	}
	return $count
} -result [expr {[info exists ::tcl_precision] ? 92 : 91}]

Changes to tests/http.test.

657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
} -returnCodes error -body {
    # this would be reverting to http <=2.4 behavior
    http::config -urlencoding ""
    http::mapReply "∈"
} -cleanup {
    http::config -urlencoding $enc
} -result "can't read \"formMap(∈)\": no such element in array"
test http-7.4 {http::formatQuery} -setup {
    set enc [http::config -urlencoding]
} -body {
    # this would be reverting to http <=2.4 behavior w/o errors
    # (unknown chars become '?')
    http::config -urlencoding "iso8859-1"
    http::mapReply "∈"
} -cleanup {







|







657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
} -returnCodes error -body {
    # this would be reverting to http <=2.4 behavior
    http::config -urlencoding ""
    http::mapReply "∈"
} -cleanup {
    http::config -urlencoding $enc
} -result "can't read \"formMap(∈)\": no such element in array"
test http-7.4 {http::formatQuery} -constraints deprecated -setup {
    set enc [http::config -urlencoding]
} -body {
    # this would be reverting to http <=2.4 behavior w/o errors
    # (unknown chars become '?')
    http::config -urlencoding "iso8859-1"
    http::mapReply "∈"
} -cleanup {

Changes to tests/io.test.

264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
    set f [open $path(test1) w]
    fconfigure $f -encoding ascii -buffering line -translation crlf
    puts -nonewline $f "\n12"
    set x [contents $path(test1)]
    close $f
    set x
} "\r\n12"
test io-3.4 {WriteChars: loop over stage buffer} {
    # stage buffer maps to more than can be queued at once.

    set f [open $path(test1) w]
    fconfigure $f -encoding jis0208 -buffersize 16
    puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
    close $f
    lappend x [contents $path(test1)]
} [list "!)!)!)!)!)!)!)!)" "!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)"]
test io-3.5 {WriteChars: saved != 0} {
    # Bytes produced by UtfToExternal from end of last channel buffer
    # had to be moved to beginning of next channel buffer to preserve
    # requested buffersize.

    set f [open $path(test1) w]
    fconfigure $f -encoding jis0208 -buffersize 17
    puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"







|









|







264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
    set f [open $path(test1) w]
    fconfigure $f -encoding ascii -buffering line -translation crlf
    puts -nonewline $f "\n12"
    set x [contents $path(test1)]
    close $f
    set x
} "\r\n12"
test io-3.4 {WriteChars: loop over stage buffer} deprecated {
    # stage buffer maps to more than can be queued at once.

    set f [open $path(test1) w]
    fconfigure $f -encoding jis0208 -buffersize 16
    puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
    close $f
    lappend x [contents $path(test1)]
} [list "!)!)!)!)!)!)!)!)" "!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)"]
test io-3.5 {WriteChars: saved != 0} deprecated {
    # Bytes produced by UtfToExternal from end of last channel buffer
    # had to be moved to beginning of next channel buffer to preserve
    # requested buffersize.

    set f [open $path(test1) w]
    fconfigure $f -encoding jis0208 -buffersize 17
    puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
    set f [open $path(test1) w]
    fconfigure $f -encoding shiftjis -buffersize 16
    puts -nonewline $f "12345678901234AB"
    set x [list [contents $path(test1)]]
    close $f
    lappend x [contents $path(test1)]
} [list "12345678901234\x82\x60" "12345678901234\x82\x60\x82\x61"]
test io-3.7 {WriteChars: (bufPtr->nextAdded > bufPtr->length)} {
    # When translating UTF-8 to external, the produced bytes went past end
    # of the channel buffer.  This is done purpose -- we then truncate the
    # bytes at the end of the partial character to preserve the requested
    # blocksize on flush.  The truncated bytes are moved to the beginning
    # of the next channel buffer.

    set f [open $path(test1) w]







|







303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
    set f [open $path(test1) w]
    fconfigure $f -encoding shiftjis -buffersize 16
    puts -nonewline $f "12345678901234AB"
    set x [list [contents $path(test1)]]
    close $f
    lappend x [contents $path(test1)]
} [list "12345678901234\x82\x60" "12345678901234\x82\x60\x82\x61"]
test io-3.7 {WriteChars: (bufPtr->nextAdded > bufPtr->length)} deprecated {
    # When translating UTF-8 to external, the produced bytes went past end
    # of the channel buffer.  This is done purpose -- we then truncate the
    # bytes at the end of the partial character to preserve the requested
    # blocksize on flush.  The truncated bytes are moved to the beginning
    # of the next channel buffer.

    set f [open $path(test1) w]
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 10
    set in [read $f]
    close $f
    scan [string index $in end] %c
} 160
test io-12.9 {ReadChars: multibyte chars split} {
    set f [open $path(test1) w]
    fconfigure $f -translation binary
    puts -nonewline $f [string repeat a 9]\xC2
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 10
    set in [read $f]
    close $f
    scan [string index $in end] %c
} 194
test io-12.10 {ReadChars: multibyte chars split} {
    set f [open $path(test1) w]
    fconfigure $f -translation binary
    puts -nonewline $f [string repeat a 9]\xC2
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 11
    set in [read $f]







|










|







1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 10
    set in [read $f]
    close $f
    scan [string index $in end] %c
} 160
test io-12.9 {ReadChars: multibyte chars split} deprecated {
    set f [open $path(test1) w]
    fconfigure $f -translation binary
    puts -nonewline $f [string repeat a 9]\xC2
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 10
    set in [read $f]
    close $f
    scan [string index $in end] %c
} 194
test io-12.10 {ReadChars: multibyte chars split} deprecated {
    set f [open $path(test1) w]
    fconfigure $f -translation binary
    puts -nonewline $f [string repeat a 9]\xC2
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 11
    set in [read $f]

Changes to tests/main.test.

139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
	close $f
	removeFile script
    } -result [list script {} 0]\n1\n

    test Tcl_Main-1.8 {
	Tcl_Main: startup script - -encoding option - mismatched encodings
    } -constraints {
	stdio
    } -setup {
	set script [makeFile {} script]
	file delete $script
	set f [open $script w]
	chan configure $f -encoding utf-8
	puts $f {puts [list $argv0 $argv $tcl_interactive]}
	puts -nonewline $f {puts [string equal \u20ac }







|







139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
	close $f
	removeFile script
    } -result [list script {} 0]\n1\n

    test Tcl_Main-1.8 {
	Tcl_Main: startup script - -encoding option - mismatched encodings
    } -constraints {
	stdio deprecated
    } -setup {
	set script [makeFile {} script]
	file delete $script
	set f [open $script w]
	chan configure $f -encoding utf-8
	puts $f {puts [list $argv0 $argv $tcl_interactive]}
	puts -nonewline $f {puts [string equal \u20ac }

Changes to tests/safe.test.

1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
} -result foobar
test safe-11.7 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertfrom
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?encoding? data"}
test safe-11.7.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertfrom} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?encoding? data"
    while executing
"encoding convertfrom"
    invoked from within
"encoding convertfrom"
    invoked from within
"interp eval $i encoding convertfrom"}
test safe-11.8 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertto
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?encoding? data"}
test safe-11.8.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertto} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?encoding? data"
    while executing
"encoding convertto"
    invoked from within
"encoding convertto"
    invoked from within
"interp eval $i encoding convertto"}








|








|












|








|







1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
} -result foobar
test safe-11.7 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertfrom
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?encoding? data"}
test safe-11.7.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertfrom} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?encoding? data"
    while executing
"encoding convertfrom"
    invoked from within
"encoding convertfrom"
    invoked from within
"interp eval $i encoding convertfrom"}
test safe-11.8 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertto
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?encoding? data"}
test safe-11.8.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertto} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?encoding? data"
    while executing
"encoding convertto"
    invoked from within
"encoding convertto"
    invoked from within
"interp eval $i encoding convertto"}

Changes to tests/source.test.

271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
} -body {
    source -encoding utf-8 $sourcefile
    €
} -cleanup {
    removeFile source.file
    rename € {}
} -result foo
test source-7.6 {source -encoding: mismatch encoding error} -setup {
    set sourcefile [makeFile {} source.file]
    file delete $sourcefile
    set f [open $sourcefile w]
    fconfigure $f -encoding utf-8
    puts $f "proc € {} {return foo}"
    close $f
} -body {







|







271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
} -body {
    source -encoding utf-8 $sourcefile
    €
} -cleanup {
    removeFile source.file
    rename € {}
} -result foo
test source-7.6 {source -encoding: mismatch encoding error} -constraints deprecated -setup {
    set sourcefile [makeFile {} source.file]
    file delete $sourcefile
    set f [open $sourcefile w]
    fconfigure $f -encoding utf-8
    puts $f "proc € {} {return foo}"
    close $f
} -body {