Tcl Source Code

Changes On Branch encodings-with-flags
Login
Bounty program for improvements to Tcl and certain Tcl packages.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Changes In Branch encodings-with-flags Excluding Merge-Ins

This is equivalent to a diff from ef3c9a8997 to b55880e06d

2021-05-27
12:13
Merge 8.6. Make c++ compile (with g++-11) warning-free check-in: 764e13e16d user: jan.nijtmans tags: core-8-branch
2021-05-26
14:10
More testcases regarding possible parse errors Leaf check-in: b55880e06d user: jan.nijtmans tags: encodings-with-flags
14:04
Handle the situation when there is "-nothrow" or "-stoponerror" but without providing encoding check-in: 5c1a147c9f user: jan.nijtmans tags: encodings-with-flags
13:23
Merge 8.7 and doc fix check-in: 70cb984279 user: jan.nijtmans tags: encodings-with-flags
11:27
Merge 8.7 check-in: 7f17627d6b user: jan.nijtmans tags: trunk, main
11:23
Fix [a73c79081e]: Doc fix in tcl.h. Recommend int32_t in stead of wchar_t check-in: ef3c9a8997 user: jan.nijtmans tags: core-8-branch
10:47
Fix [a73c79081e]: Doc fix in tcl.h, by not suggesting wchar_t any more for Tcl_UniChar. check-in: 8c52ca3059 user: jan.nijtmans tags: core-8-6-branch
10:01
One more macos-11.0 -> macos-11 check-in: b6da37acb0 user: jan.nijtmans tags: core-8-branch

Changes to doc/Encoding.3.

21
22
23
24
25
26
27



28
29
30



31
32
33
34
35
36
37
...
104
105
106
107
108
109
110
111



112
113
114
115
116
117
118
...
203
204
205
206
207
208
209





210
211
212
213
214
215
216
...
241
242
243
244
245
246
247





248
249
250
251
252
253
254
.sp
int
\fBTcl_GetEncodingFromObj\fR(\fIinterp, objPtr, encodingPtr\fR)
.sp
char *
\fBTcl_ExternalToUtfDString\fR(\fIencoding, src, srcLen, dstPtr\fR)
.sp



char *
\fBTcl_UtfToExternalDString\fR(\fIencoding, src, srcLen, dstPtr\fR)
.sp



int
\fBTcl_ExternalToUtf\fR(\fIinterp, encoding, src, srcLen, flags, statePtr,
                  dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr\fR)
.sp
int
\fBTcl_UtfToExternal\fR(\fIinterp, encoding, src, srcLen, flags, statePtr,
                  dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr\fR)
................................................................................
converted. \fBTCL_ENCODING_END\fR signifies that the source buffer is the last
block in a (potentially multi-block) input stream, telling the conversion
routine to perform any finalization that needs to occur after the last
byte is converted and then to reset to an initial state.
\fBTCL_ENCODING_STOPONERROR\fR signifies that the conversion routine should
return immediately upon reading a source character that does not exist in
the target encoding; otherwise a default fallback character will
automatically be substituted.



.AP Tcl_EncodingState *statePtr in/out
Used when converting a (generally long or indefinite length) byte stream
in a piece-by-piece fashion.  The conversion routine stores its current
state in \fI*statePtr\fR after \fIsrc\fR (the buffer containing the
current piece) has been converted; that state information must be passed
back when converting the next piece of the stream so the conversion
routine knows what state it was in when it left off at the end of the
................................................................................
\fBTcl_ExternalToUtfDString\fR converts a source buffer \fIsrc\fR from the
specified \fIencoding\fR into UTF-8.  The converted bytes are stored in
\fIdstPtr\fR, which is then null-terminated.  The caller should eventually
call \fBTcl_DStringFree\fR to free any information stored in \fIdstPtr\fR.
When converting, if any of the characters in the source buffer cannot be
represented in the target encoding, a default fallback character will be
used.  The return value is a pointer to the value stored in the DString.





.PP
\fBTcl_ExternalToUtf\fR converts a source buffer \fIsrc\fR from the specified
\fIencoding\fR into UTF-8.  Up to \fIsrcLen\fR bytes are converted from the
source buffer and up to \fIdstLen\fR converted bytes are stored in \fIdst\fR.
In all cases, \fI*srcReadPtr\fR is filled with the number of bytes that were
successfully converted from \fIsrc\fR and \fI*dstWrotePtr\fR is filled with
the corresponding number of bytes that were stored in \fIdst\fR.  The return
................................................................................
into the specified \fIencoding\fR.  The converted bytes are stored in
\fIdstPtr\fR, which is then terminated with the appropriate encoding-specific
null.  The caller should eventually call \fBTcl_DStringFree\fR to free any
information stored in \fIdstPtr\fR.  When converting, if any of the
characters in the source buffer cannot be represented in the target
encoding, a default fallback character will be used.  The return value is
a pointer to the value stored in the DString.





.PP
\fBTcl_UtfToExternal\fR converts a source buffer \fIsrc\fR from UTF-8 into
the specified \fIencoding\fR.  Up to \fIsrcLen\fR bytes are converted from
the source buffer and up to \fIdstLen\fR converted bytes are stored in
\fIdst\fR.  In all cases, \fI*srcReadPtr\fR is filled with the number of
bytes that were successfully converted from \fIsrc\fR and \fI*dstWrotePtr\fR
is filled with the corresponding number of bytes that were stored in






>
>
>



>
>
>







 







|
>
>
>







 







>
>
>
>
>







 







>
>
>
>
>







21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
...
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
...
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
...
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
.sp
int
\fBTcl_GetEncodingFromObj\fR(\fIinterp, objPtr, encodingPtr\fR)
.sp
char *
\fBTcl_ExternalToUtfDString\fR(\fIencoding, src, srcLen, dstPtr\fR)
.sp
size_t
\fBTcl_ExternalToUtfDStringEx\fR(\fIencoding, src, srcLen, flags, dstPtr\fR)
.sp
char *
\fBTcl_UtfToExternalDString\fR(\fIencoding, src, srcLen, dstPtr\fR)
.sp
size_t
\fBTcl_UtfToExternalDStringEx\fR(\fIencoding, src, srcLen, flags, dstPtr\fR)
.sp
int
\fBTcl_ExternalToUtf\fR(\fIinterp, encoding, src, srcLen, flags, statePtr,
                  dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr\fR)
.sp
int
\fBTcl_UtfToExternal\fR(\fIinterp, encoding, src, srcLen, flags, statePtr,
                  dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr\fR)
................................................................................
converted. \fBTCL_ENCODING_END\fR signifies that the source buffer is the last
block in a (potentially multi-block) input stream, telling the conversion
routine to perform any finalization that needs to occur after the last
byte is converted and then to reset to an initial state.
\fBTCL_ENCODING_STOPONERROR\fR signifies that the conversion routine should
return immediately upon reading a source character that does not exist in
the target encoding; otherwise a default fallback character will
automatically be substituted. The flag \fBTCL_ENCODING_NO_THROW\fR has
no effect, it is reserved for Tcl 9.0. The flag \fBTCL_ENCODING_MODIFIED\fR makes
\fBTcl_UtfToExternalDStringEx\fR and \fBTcl_UtfToExternal\fR produce the
byte sequence \exC0\ex80 in stead of \ex00, for the utf-8/cesu-8 encoders.
.AP Tcl_EncodingState *statePtr in/out
Used when converting a (generally long or indefinite length) byte stream
in a piece-by-piece fashion.  The conversion routine stores its current
state in \fI*statePtr\fR after \fIsrc\fR (the buffer containing the
current piece) has been converted; that state information must be passed
back when converting the next piece of the stream so the conversion
routine knows what state it was in when it left off at the end of the
................................................................................
\fBTcl_ExternalToUtfDString\fR converts a source buffer \fIsrc\fR from the
specified \fIencoding\fR into UTF-8.  The converted bytes are stored in
\fIdstPtr\fR, which is then null-terminated.  The caller should eventually
call \fBTcl_DStringFree\fR to free any information stored in \fIdstPtr\fR.
When converting, if any of the characters in the source buffer cannot be
represented in the target encoding, a default fallback character will be
used.  The return value is a pointer to the value stored in the DString.
.PP
\fBTcl_ExternalToUtfDStringEx\fR is the same as \fBTcl_ExternalToUtfDString\fR,
but it has an additional flags parameter.  The return value is the index of
the first byte in the input string causing a conversion error.
Or (size_t)-1 if all is OK.
.PP
\fBTcl_ExternalToUtf\fR converts a source buffer \fIsrc\fR from the specified
\fIencoding\fR into UTF-8.  Up to \fIsrcLen\fR bytes are converted from the
source buffer and up to \fIdstLen\fR converted bytes are stored in \fIdst\fR.
In all cases, \fI*srcReadPtr\fR is filled with the number of bytes that were
successfully converted from \fIsrc\fR and \fI*dstWrotePtr\fR is filled with
the corresponding number of bytes that were stored in \fIdst\fR.  The return
................................................................................
into the specified \fIencoding\fR.  The converted bytes are stored in
\fIdstPtr\fR, which is then terminated with the appropriate encoding-specific
null.  The caller should eventually call \fBTcl_DStringFree\fR to free any
information stored in \fIdstPtr\fR.  When converting, if any of the
characters in the source buffer cannot be represented in the target
encoding, a default fallback character will be used.  The return value is
a pointer to the value stored in the DString.
.PP
\fBTcl_UtfToExternalDStringEx\fR is the same as \fBTcl_UtfToExternalDString\fR,
but it has an additional flags parameter.  The return value is the index of
the first byte of an utf-8 byte-sequence in the input string causing a
conversion error. Or (size_t)-1 if all is OK.
.PP
\fBTcl_UtfToExternal\fR converts a source buffer \fIsrc\fR from UTF-8 into
the specified \fIencoding\fR.  Up to \fIsrcLen\fR bytes are converted from
the source buffer and up to \fIdstLen\fR converted bytes are stored in
\fIdst\fR.  In all cases, \fI*srcReadPtr\fR is filled with the number of
bytes that were successfully converted from \fIsrc\fR and \fI*dstWrotePtr\fR
is filled with the corresponding number of bytes that were stored in

Changes to generic/tcl.decls.

2422
2423
2424
2425
2426
2427
2428








2429
2430
2431
2432
2433
2434
2435
}
declare 656 {
    const char *Tcl_UtfPrev(const char *src, const char *start)
}
declare 657 {
    int Tcl_UniCharIsUnicode(int ch)
}









# ----- BASELINE -- FOR -- 8.7.0 ----- #

##############################################################################

# Define the platform specific public Tcl interface. These functions are only
# available on the designated platform.






>
>
>
>
>
>
>
>







2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
}
declare 656 {
    const char *Tcl_UtfPrev(const char *src, const char *start)
}
declare 657 {
    int Tcl_UniCharIsUnicode(int ch)
}
declare 658 {
    size_t Tcl_ExternalToUtfDStringEx(Tcl_Encoding encoding,
	    const char *src, int srcLen, int flags, Tcl_DString *dsPtr)
}
declare 659 {
    size_t Tcl_UtfToExternalDStringEx(Tcl_Encoding encoding,
	    const char *src, int srcLen, int flags, Tcl_DString *dsPtr)
}

# ----- BASELINE -- FOR -- 8.7.0 ----- #

##############################################################################

# Define the platform specific public Tcl interface. These functions are only
# available on the designated platform.

Changes to generic/tcl.h.

2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
....
2070
2071
2072
2073
2074
2075
2076












2077
2078
2079
2080
2081
2082
2083


2084
2085
2086
2087
2088
2089
2090
 *				reset to an initial state. If the source
 *				buffer contains the entire input stream to be
 *				converted, this flag should be set.
 * TCL_ENCODING_STOPONERROR -	If set, the converter returns immediately upon
 *				encountering an invalid byte sequence or a
 *				source character that has no mapping in the
 *				target encoding. If clear, the converter
 *				substitues the problematic character(s) with
 *				one or more "close" characters in the
 *				destination buffer and then continues to
 *				convert the source.
 * TCL_ENCODING_NO_TERMINATE - 	If set, Tcl_ExternalToUtf does not append a
 *				terminating NUL byte.  Since it does not need
 *				an extra byte for a terminating NUL, it fills
 *				all dstLen bytes with encoded UTF-8 content if
 *				needed.  If clear, a byte is reserved in the
 *				dst space for NUL termination, and a
 *				terminating NUL is appended.
................................................................................
 * TCL_ENCODING_CHAR_LIMIT -	If set and dstCharsPtr is not NULL, then
 *				Tcl_ExternalToUtf takes the initial value of
 *				*dstCharsPtr as a limit of the maximum number
 *				of chars to produce in the encoded UTF-8
 *				content.  Otherwise, the number of chars
 *				produced is controlled only by other limiting
 *				factors.












 */

#define TCL_ENCODING_START		0x01
#define TCL_ENCODING_END		0x02
#define TCL_ENCODING_STOPONERROR	0x04
#define TCL_ENCODING_NO_TERMINATE	0x08
#define TCL_ENCODING_CHAR_LIMIT		0x10



/*
 * The following definitions are the error codes returned by the conversion
 * routines:
 *
 * TCL_OK -			All characters were converted.
 * TCL_CONVERT_NOSPACE -	The output buffer would not have been large






|


|







 







>
>
>
>
>
>
>
>
>
>
>
>







>
>







2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
....
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
 *				reset to an initial state. If the source
 *				buffer contains the entire input stream to be
 *				converted, this flag should be set.
 * TCL_ENCODING_STOPONERROR -	If set, the converter returns immediately upon
 *				encountering an invalid byte sequence or a
 *				source character that has no mapping in the
 *				target encoding. If clear, the converter
 *				substitutes the problematic character(s) with
 *				one or more "close" characters in the
 *				destination buffer and then continues to
 *				convert the source. Only for Tcl 8.x.
 * TCL_ENCODING_NO_TERMINATE - 	If set, Tcl_ExternalToUtf does not append a
 *				terminating NUL byte.  Since it does not need
 *				an extra byte for a terminating NUL, it fills
 *				all dstLen bytes with encoded UTF-8 content if
 *				needed.  If clear, a byte is reserved in the
 *				dst space for NUL termination, and a
 *				terminating NUL is appended.
................................................................................
 * TCL_ENCODING_CHAR_LIMIT -	If set and dstCharsPtr is not NULL, then
 *				Tcl_ExternalToUtf takes the initial value of
 *				*dstCharsPtr as a limit of the maximum number
 *				of chars to produce in the encoded UTF-8
 *				content.  Otherwise, the number of chars
 *				produced is controlled only by other limiting
 *				factors.
 * TCL_ENCODING_MODIFIED -	Convert NULL bytes to \xC0\x80 in stead of
 *				0x00. Only valid for "utf-8" and "cesu-8".
 *				This flag is implicit for external -> internal conversions,
 *				optional for internal -> external conversions.
 * TCL_ENCODING_NO_THROW -	If set, the converter
 *				substitutes the problematic character(s) with
 *				one or more "close" characters in the
 *				destination buffer and then continues to
 *				convert the source. If clear, the converter returns
 *				immediately upon encountering an invalid byte sequence
 *				or a source character that has no mapping in the
 *				target encoding. Only for Tcl 9.x.
 */

#define TCL_ENCODING_START		0x01
#define TCL_ENCODING_END		0x02
#define TCL_ENCODING_STOPONERROR	0x04
#define TCL_ENCODING_NO_TERMINATE	0x08
#define TCL_ENCODING_CHAR_LIMIT		0x10
#define TCL_ENCODING_MODIFIED		0x20
#define TCL_ENCODING_NO_THROW		0x40

/*
 * The following definitions are the error codes returned by the conversion
 * routines:
 *
 * TCL_OK -			All characters were converted.
 * TCL_CONVERT_NOSPACE -	The output buffer would not have been large

Changes to generic/tclCmdAH.c.

509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
...
546
547
548
549
550
551
552






553
554
555
556









557
558
559
560
561

562









563
564
565
566

567
568
569


570






571











572
573
574
575
576
577
578
...
608
609
610
611
612
613
614






615
616
617
618









619
620
621
622
623

624









625
626
627
628

629
630
631
632
633
634














635
636
637
638
639
640
641
 */

Tcl_Command
TclInitEncodingCmd(
    Tcl_Interp* interp)		/* Tcl interpreter */
{
    static const EnsembleImplMap encodingImplMap[] = {
	{"convertfrom", EncodingConvertfromObjCmd, TclCompileBasic1Or2ArgCmd, NULL, NULL, 0},
	{"convertto",   EncodingConverttoObjCmd,   TclCompileBasic1Or2ArgCmd, NULL, NULL, 0},
	{"dirs",        EncodingDirsObjCmd,        TclCompileBasic0Or1ArgCmd, NULL, NULL, 1},
	{"names",       EncodingNamesObjCmd,       TclCompileBasic0ArgCmd,    NULL, NULL, 0},
	{"system",      EncodingSystemObjCmd,      TclCompileBasic0Or1ArgCmd, NULL, NULL, 1},
	{NULL,          NULL,                      NULL,                      NULL, NULL, 0}
    };

    return TclMakeEnsemble(interp, "encoding", encodingImplMap);
................................................................................
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_Obj *data;		/* Byte array to convert */
    Tcl_DString ds;		/* Buffer to hold the string */
    Tcl_Encoding encoding;	/* Encoding to use */
    int length;			/* Length of the byte array being converted */
    const char *bytesPtr;	/* Pointer to the first byte of the array */







    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];









    } else if (objc == 3) {
	if (Tcl_GetEncodingFromObj(interp, objv[1], &encoding) != TCL_OK) {
	    return TCL_ERROR;
	}
	data = objv[2];

    } else {









	Tcl_WrongNumArgs(interp, 1, objv, "?encoding? data");
	return TCL_ERROR;
    }


    /*
     * Convert the string into a byte array in 'ds'
     */


    bytesPtr = (char *) Tcl_GetByteArrayFromObj(data, &length);






    Tcl_ExternalToUtfDString(encoding, bytesPtr, length, &ds);












    /*
     * Note that we cannot use Tcl_DStringResult here because it will
     * truncate the string at the first null byte.
     */

    Tcl_SetObjResult(interp, TclDStringToObj(&ds));
................................................................................
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_Obj *data;		/* String to convert */
    Tcl_DString ds;		/* Buffer to hold the byte array */
    Tcl_Encoding encoding;	/* Encoding to use */
    int length;			/* Length of the string being converted */
    const char *stringPtr;	/* Pointer to the first byte of the string */







    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];









    } else if (objc == 3) {
	if (Tcl_GetEncodingFromObj(interp, objv[1], &encoding) != TCL_OK) {
	    return TCL_ERROR;
	}
	data = objv[2];

    } else {









	Tcl_WrongNumArgs(interp, 1, objv, "?encoding? data");
	return TCL_ERROR;
    }


    /*
     * Convert the string to a byte array in 'ds'
     */

    stringPtr = TclGetStringFromObj(data, &length);
    Tcl_UtfToExternalDString(encoding, stringPtr, length, &ds);














    Tcl_SetObjResult(interp,
		     Tcl_NewByteArrayObj((unsigned char*) Tcl_DStringValue(&ds),
					 Tcl_DStringLength(&ds)));
    Tcl_DStringFree(&ds);

    /*
     * We're done with the encoding






|
|







 







>
>
>
>
>
>




>
>
>
>
>
>
>
>
>
|
|
|
|
<
>
|
>
>
>
>
>
>
>
>
>
|



>



>
>
|
>
>
>
>
>
>
|
>
>
>
>
>
>
>
>
>
>
>







 







>
>
>
>
>
>




>
>
>
>
>
>
>
>
>
|
|
|
|
<
>
|
>
>
>
>
>
>
>
>
>
|



>





|
>
>
>
>
>
>
>
>
>
>
>
>
>
>







509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
...
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575

576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
...
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681

682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
 */

Tcl_Command
TclInitEncodingCmd(
    Tcl_Interp* interp)		/* Tcl interpreter */
{
    static const EnsembleImplMap encodingImplMap[] = {
	{"convertfrom", EncodingConvertfromObjCmd, TclCompileBasic1To3ArgCmd, NULL, NULL, 0},
	{"convertto",   EncodingConverttoObjCmd,   TclCompileBasic1To3ArgCmd, NULL, NULL, 0},
	{"dirs",        EncodingDirsObjCmd,        TclCompileBasic0Or1ArgCmd, NULL, NULL, 1},
	{"names",       EncodingNamesObjCmd,       TclCompileBasic0ArgCmd,    NULL, NULL, 0},
	{"system",      EncodingSystemObjCmd,      TclCompileBasic0Or1ArgCmd, NULL, NULL, 1},
	{NULL,          NULL,                      NULL,                      NULL, NULL, 0}
    };

    return TclMakeEnsemble(interp, "encoding", encodingImplMap);
................................................................................
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_Obj *data;		/* Byte array to convert */
    Tcl_DString ds;		/* Buffer to hold the string */
    Tcl_Encoding encoding;	/* Encoding to use */
    int length;			/* Length of the byte array being converted */
    const char *bytesPtr;	/* Pointer to the first byte of the array */
#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
    int flags = TCL_ENCODING_STOPONERROR;
#else
    int flags = TCL_ENCODING_NO_THROW;
#endif
    size_t result;

    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];
    } else if ((unsigned)(objc - 2) < 3) {
	data = objv[objc - 1];
	bytesPtr = Tcl_GetString(objv[1]);
	if (bytesPtr[0] == '-' && bytesPtr[1] == 'n'
		&& !strncmp(bytesPtr, "-nothrow", strlen(bytesPtr))) {
	    flags = TCL_ENCODING_NO_THROW;
	} else if (bytesPtr[0] == '-' && bytesPtr[1] == 's'
		&& !strncmp(bytesPtr, "-stoponerror", strlen(bytesPtr))) {
	    flags = TCL_ENCODING_STOPONERROR;
	} else if (objc < 4) {
	    if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
		return TCL_ERROR;
	    }

	    goto encConvFromOK;
	} else {
	    goto encConvFromError;
	}
	if (objc < 4) {
	    encoding = Tcl_GetEncoding(interp, NULL);
	} else if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
	    return TCL_ERROR;
	}
    } else {
    encConvFromError:
	Tcl_WrongNumArgs(interp, 1, objv, "?-nothrow|-stoponerror? ?encoding? data");
	return TCL_ERROR;
    }

encConvFromOK:
    /*
     * Convert the string into a byte array in 'ds'
     */
#if !defined(TCL_NO_DEPRECATED) && (TCL_MAJOR_VERSION < 9)
    if (!(flags & TCL_ENCODING_STOPONERROR)) {
	bytesPtr = (char *) Tcl_GetByteArrayFromObj(data, &length);
    } else
#endif
    bytesPtr = (char *) TclGetBytesFromObj(interp, data, &length);
    if (bytesPtr == NULL) {
	return TCL_ERROR;
    }
    result = Tcl_ExternalToUtfDStringEx(encoding, bytesPtr, length,
	    flags, &ds);
    if ((flags & TCL_ENCODING_STOPONERROR) && (result != (size_t)-1)) {
	char buf[TCL_INTEGER_SPACE];
	sprintf(buf, "%" TCL_Z_MODIFIER "u", result);
	Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected byte sequence starting at index %"
		TCL_Z_MODIFIER "u: '\\x%X'", result, UCHAR(bytesPtr[result])));
	Tcl_SetErrorCode(interp, "TCL", "ENCODING", "STOPONERROR",
		buf, NULL);
	Tcl_DStringFree(&ds);
	return TCL_ERROR;
    }

    /*
     * Note that we cannot use Tcl_DStringResult here because it will
     * truncate the string at the first null byte.
     */

    Tcl_SetObjResult(interp, TclDStringToObj(&ds));
................................................................................
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    Tcl_Obj *data;		/* String to convert */
    Tcl_DString ds;		/* Buffer to hold the byte array */
    Tcl_Encoding encoding;	/* Encoding to use */
    int length;			/* Length of the string being converted */
    const char *stringPtr;	/* Pointer to the first byte of the string */
    size_t result;
#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
    int flags = TCL_ENCODING_STOPONERROR;
#else
    int flags = TCL_ENCODING_NO_THROW;
#endif

    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];
    } else if ((unsigned)(objc - 2) < 3) {
	data = objv[objc - 1];
	stringPtr = Tcl_GetString(objv[1]);
	if (stringPtr[0] == '-' && stringPtr[1] == 'n'
		&& !strncmp(stringPtr, "-nothrow", strlen(stringPtr))) {
	    flags = TCL_ENCODING_NO_THROW;
	} else if (stringPtr[0] == '-' && stringPtr[1] == 's'
		&& !strncmp(stringPtr, "-stoponerror", strlen(stringPtr))) {
	    flags = TCL_ENCODING_STOPONERROR;
	} else if (objc < 4) {
	    if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
		return TCL_ERROR;
	    }

	    goto encConvToOK;
	} else {
	    goto encConvToError;
	}
	if (objc < 4) {
	    encoding = Tcl_GetEncoding(interp, NULL);
	} else if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
	    return TCL_ERROR;
	}
    } else {
    encConvToError:
	Tcl_WrongNumArgs(interp, 1, objv, "?-nothrow|-stoponerror? ?encoding? data");
	return TCL_ERROR;
    }

encConvToOK:
    /*
     * Convert the string to a byte array in 'ds'
     */

    stringPtr = TclGetStringFromObj(data, &length);
    result = Tcl_UtfToExternalDStringEx(encoding, stringPtr, length,
	    flags, &ds);
    if ((flags & TCL_ENCODING_STOPONERROR) && (result != (size_t)-1)) {
	size_t pos = Tcl_NumUtfChars(stringPtr, result);
	int ucs4;
	char buf[TCL_INTEGER_SPACE];
	TclUtfToUCS4(&stringPtr[result], &ucs4);
	sprintf(buf, "%" TCL_Z_MODIFIER "u", result);
	Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected character at index %"
		TCL_Z_MODIFIER "u: 'U+%06X'", pos, ucs4));
	Tcl_SetErrorCode(interp, "TCL", "ENCODING", "STOPONERROR",
		buf, NULL);
	Tcl_DStringFree(&ds);
	return TCL_ERROR;
    }
    Tcl_SetObjResult(interp,
		     Tcl_NewByteArrayObj((unsigned char*) Tcl_DStringValue(&ds),
					 Tcl_DStringLength(&ds)));
    Tcl_DStringFree(&ds);

    /*
     * We're done with the encoding

Changes to generic/tclDecls.h.

1935
1936
1937
1938
1939
1940
1941








1942
1943
1944
1945
1946
1947
1948
....
2628
2629
2630
2631
2632
2633
2634


2635
2636
2637
2638
2639
2640
2641
....
3972
3973
3974
3975
3976
3977
3978




3979
3980
3981
3982
3983
3984
3985
EXTERN int		Tcl_UtfCharComplete(const char *src, int length);
/* 655 */
EXTERN const char *	Tcl_UtfNext(const char *src);
/* 656 */
EXTERN const char *	Tcl_UtfPrev(const char *src, const char *start);
/* 657 */
EXTERN int		Tcl_UniCharIsUnicode(int ch);









typedef struct {
    const struct TclPlatStubs *tclPlatStubs;
    const struct TclIntStubs *tclIntStubs;
    const struct TclIntPlatStubs *tclIntPlatStubs;
} TclStubHooks;

................................................................................
    char * (*tclGetStringFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 651 */
    Tcl_UniChar * (*tclGetUnicodeFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 652 */
    unsigned char * (*tclGetByteArrayFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 653 */
    int (*tcl_UtfCharComplete) (const char *src, int length); /* 654 */
    const char * (*tcl_UtfNext) (const char *src); /* 655 */
    const char * (*tcl_UtfPrev) (const char *src, const char *start); /* 656 */
    int (*tcl_UniCharIsUnicode) (int ch); /* 657 */


} TclStubs;

extern const TclStubs *tclStubsPtr;

#ifdef __cplusplus
}
#endif
................................................................................
	(tclStubsPtr->tcl_UtfCharComplete) /* 654 */
#define Tcl_UtfNext \
	(tclStubsPtr->tcl_UtfNext) /* 655 */
#define Tcl_UtfPrev \
	(tclStubsPtr->tcl_UtfPrev) /* 656 */
#define Tcl_UniCharIsUnicode \
	(tclStubsPtr->tcl_UniCharIsUnicode) /* 657 */





#endif /* defined(USE_TCL_STUBS) */

/* !END!: Do not edit above this line. */

#undef TclUnusedStubEntry
#if defined(USE_TCL_STUBS)






>
>
>
>
>
>
>
>







 







>
>







 







>
>
>
>







1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
....
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
....
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
EXTERN int		Tcl_UtfCharComplete(const char *src, int length);
/* 655 */
EXTERN const char *	Tcl_UtfNext(const char *src);
/* 656 */
EXTERN const char *	Tcl_UtfPrev(const char *src, const char *start);
/* 657 */
EXTERN int		Tcl_UniCharIsUnicode(int ch);
/* 658 */
EXTERN size_t		Tcl_ExternalToUtfDStringEx(Tcl_Encoding encoding,
				const char *src, int srcLen, int flags,
				Tcl_DString *dsPtr);
/* 659 */
EXTERN size_t		Tcl_UtfToExternalDStringEx(Tcl_Encoding encoding,
				const char *src, int srcLen, int flags,
				Tcl_DString *dsPtr);

typedef struct {
    const struct TclPlatStubs *tclPlatStubs;
    const struct TclIntStubs *tclIntStubs;
    const struct TclIntPlatStubs *tclIntPlatStubs;
} TclStubHooks;

................................................................................
    char * (*tclGetStringFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 651 */
    Tcl_UniChar * (*tclGetUnicodeFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 652 */
    unsigned char * (*tclGetByteArrayFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 653 */
    int (*tcl_UtfCharComplete) (const char *src, int length); /* 654 */
    const char * (*tcl_UtfNext) (const char *src); /* 655 */
    const char * (*tcl_UtfPrev) (const char *src, const char *start); /* 656 */
    int (*tcl_UniCharIsUnicode) (int ch); /* 657 */
    size_t (*tcl_ExternalToUtfDStringEx) (Tcl_Encoding encoding, const char *src, int srcLen, int flags, Tcl_DString *dsPtr); /* 658 */
    size_t (*tcl_UtfToExternalDStringEx) (Tcl_Encoding encoding, const char *src, int srcLen, int flags, Tcl_DString *dsPtr); /* 659 */
} TclStubs;

extern const TclStubs *tclStubsPtr;

#ifdef __cplusplus
}
#endif
................................................................................
	(tclStubsPtr->tcl_UtfCharComplete) /* 654 */
#define Tcl_UtfNext \
	(tclStubsPtr->tcl_UtfNext) /* 655 */
#define Tcl_UtfPrev \
	(tclStubsPtr->tcl_UtfPrev) /* 656 */
#define Tcl_UniCharIsUnicode \
	(tclStubsPtr->tcl_UniCharIsUnicode) /* 657 */
#define Tcl_ExternalToUtfDStringEx \
	(tclStubsPtr->tcl_ExternalToUtfDStringEx) /* 658 */
#define Tcl_UtfToExternalDStringEx \
	(tclStubsPtr->tcl_UtfToExternalDStringEx) /* 659 */

#endif /* defined(USE_TCL_STUBS) */

/* !END!: Do not edit above this line. */

#undef TclUnusedStubEntry
#if defined(USE_TCL_STUBS)

Changes to generic/tclEncoding.c.

506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
....
1119
1120
1121
1122
1123
1124
1125













































1126
1127
1128
1129

1130
1131
1132
1133
1134
1135
1136
....
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
....
1311
1312
1313
1314
1315
1316
1317














































1318
1319
1320
1321

1322
1323
1324
1325
1326
1327
1328
....
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
....
2172
2173
2174
2175
2176
2177
2178






2179
2180
2181
2182
2183
2184
2185
....
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
....
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
....
2295
2296
2297
2298
2299
2300
2301

2302
2303
2304
2305
2306
2307
2308
2309
....
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
....
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
....
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
....
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
....
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
....
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
....
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
 *
 * Side effects:
 *	Depends on the memory, object, and IO subsystems.
 *
 *---------------------------------------------------------------------------
 */

/* Those flags must not conflict with other TCL_ENCODING_* flags in tcl.h */
/* Since TCL_ENCODING_MODIFIED is only used for utf-8/cesu-8 and
 * TCL_ENCODING_LE is only used for  utf-16/ucs-2. re-use the same value */
#define TCL_ENCODING_MODIFIED	0x20	/* Converting NULL bytes to 0xC0 0x80 */
#define TCL_ENCODING_LE		TCL_ENCODING_MODIFIED	/* Little-endian encoding */
#define TCL_ENCODING_UTF	0x200	/* For UTF-8 encoding, allow 4-byte output sequences */

void
TclInitEncodingSubsystem(void)
{
    Tcl_EncodingType type;
................................................................................
				 * for the default system encoding. */
    const char *src,		/* Source string in specified encoding. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * encoding-specific string length. */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{













































    char *dst;
    Tcl_EncodingState state;
    const Encoding *encodingPtr;
    int flags, dstLen, result, soFar, srcRead, dstWrote, dstChars;


    Tcl_DStringInit(dstPtr);
    dst = Tcl_DStringValue(dstPtr);
    dstLen = dstPtr->spaceAvl - 1;

    if (encoding == NULL) {
	encoding = systemEncoding;
................................................................................

    if (src == NULL) {
	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = encodingPtr->lengthProc(src);
    }

    flags = TCL_ENCODING_START | TCL_ENCODING_END;
    if (encodingPtr->toUtfProc == UtfToUtfProc) {
	flags |= TCL_ENCODING_MODIFIED | TCL_ENCODING_UTF;
    }

    while (1) {
	result = encodingPtr->toUtfProc(encodingPtr->clientData, src, srcLen,
		flags, &state, dst, dstLen, &srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    Tcl_DStringSetLength(dstPtr, soFar);
	    return Tcl_DStringValue(dstPtr);
	}
	flags &= ~TCL_ENCODING_START;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}
	Tcl_DStringSetLength(dstPtr, 2 * Tcl_DStringLength(dstPtr) + 1);
................................................................................
				 * NULL for the default system encoding. */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * strlen(). */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{














































    char *dst;
    Tcl_EncodingState state;
    const Encoding *encodingPtr;
    int flags, dstLen, result, soFar, srcRead, dstWrote, dstChars;


    Tcl_DStringInit(dstPtr);
    dst = Tcl_DStringValue(dstPtr);
    dstLen = dstPtr->spaceAvl - 1;

    if (encoding == NULL) {
	encoding = systemEncoding;
................................................................................
    encodingPtr = (Encoding *) encoding;

    if (src == NULL) {
	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = strlen(src);
    }
    flags = TCL_ENCODING_START | TCL_ENCODING_END;
    while (1) {
	result = encodingPtr->fromUtfProc(encodingPtr->clientData, src,
		srcLen, flags, &state, dst, dstLen,
		&srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    if (encodingPtr->nullSize == 2) {
		Tcl_DStringSetLength(dstPtr, soFar + 1);
	    }
	    Tcl_DStringSetLength(dstPtr, soFar);
	    return Tcl_DStringValue(dstPtr);
	}

	flags &= ~TCL_ENCODING_START;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}
................................................................................
 *	Returns TCL_OK if conversion was successful.
 *
 * Side effects:
 *	None.
 *
 *-------------------------------------------------------------------------
 */







static int
UtfToUtfProc(
    ClientData clientData,	/* additional flags, e.g. TCL_ENCODING_MODIFIED */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes. */
    int flags,			/* Conversion control flags. */
................................................................................
	     * Always check before using TclUtfToUCS4. Not doing can so
	     * cause it run beyond the end of the buffer! If we happen such an
	     * incomplete char its bytes are made to represent themselves
	     * unless the user has explicitly asked to be told.
	     */

	    if (flags & TCL_ENCODING_MODIFIED) {
		if (flags & TCL_ENCODING_STOPONERROR) {
		    result = TCL_CONVERT_MULTIBYTE;
		    break;
		}
		ch = UCHAR(*src++);
	    } else {
		char chbuf[2];
		chbuf[0] = UCHAR(*src++); chbuf[1] = 0;
................................................................................
		TclUtfToUCS4(chbuf, &ch);
	    }
	    dst += Tcl_UniCharToUtf(ch, dst);
	} else {
	    int low;
	    const char *saveSrc = src;
	    size_t len = TclUtfToUCS4(src, &ch);
	    if ((len < 2) && (ch != 0) && (flags & TCL_ENCODING_STOPONERROR)
		    && (flags & TCL_ENCODING_MODIFIED)) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    src += len;
	    if (!(flags & TCL_ENCODING_UTF)) {
		if (ch > 0xFFFF) {
................................................................................
		 * A surrogate character is detected, handle especially.
		 */

		low = ch;
		len = (src <= srcEnd-3) ? TclUtfToUCS4(src, &low) : 0;

		if (((low & ~0x3FF) != 0xDC00) || (ch & 0x400)) {

		    if (flags & TCL_ENCODING_STOPONERROR) {
			result = TCL_CONVERT_UNKNOWN;
			src = saveSrc;
			break;
		    }
		    if (!(flags & TCL_ENCODING_MODIFIED)) {
			ch = 0xFFFD;
		    }
................................................................................
		    *dst++ = (char) ((ch | 0x80) & 0xBF);
		    continue;
		}
		src += len;
		dst += Tcl_UniCharToUtf(ch, dst);
		ch = low;
	    } else if (!Tcl_UniCharIsUnicode(ch)) {
		if (flags & TCL_ENCODING_STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    src = saveSrc;
		    break;
		}
		if (!(flags & TCL_ENCODING_MODIFIED)) {
		    ch = 0xFFFD;
		}
................................................................................
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
	}
	len = TclUtfToUCS4(src, &ch);
	if (!Tcl_UniCharIsUnicode(ch)) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    ch = 0xFFFD;
	}
	src += len;
	if (flags & TCL_ENCODING_LE) {
................................................................................
		break;
	    }
	    ch = toUnicode[byte][*((unsigned char *) src)];
	} else {
	    ch = pageZero[byte];
	}
	if ((ch == 0) && (byte != 0)) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    if (prefixBytes[byte]) {
		src--;
	    }
	    ch = (Tcl_UniChar) byte;
................................................................................
	if (!len) {
	    word = 0;
	} else
#endif
	    word = fromUnicode[(ch >> 8)][ch & 0xFF];

	if ((word == 0) && (ch != 0)) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    word = dataPtr->fallback;
	}
	if (prefixBytes[(word >> 8)] != 0) {
	    if (dst + 1 > dstEnd) {
................................................................................
	 */

	if (ch > 0xFF
#if TCL_UTF_MAX <= 3
		|| ((ch >= 0xD800) && (len < 3))
#endif
		) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
#if TCL_UTF_MAX <= 3
	    if ((ch >= 0xD800) && (len < 3)) {
		len = 4;
	    }
................................................................................
	     * We have a split-up or unrecognized escape sequence. If we
	     * checked all the sequences, then it's a syntax error, otherwise
	     * we need more bytes to determine a match.
	     */

	    if ((checked == dataPtr->numSubTables + 2)
		    || (flags & TCL_ENCODING_END)) {
		if ((flags & TCL_ENCODING_STOPONERROR) == 0) {
		    /*
		     * Skip the unknown escape sequence.
		     */

		    src += longest;
		    continue;
		}
................................................................................
		if (word != 0) {
		    break;
		}
	    }

	    if (word == 0) {
		state = oldState;
		if (flags & TCL_ENCODING_STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    break;
		}
		encodingPtr = GetTableEncoding(dataPtr, state);
		tableDataPtr = (const TableEncodingData *)encodingPtr->clientData;
		word = tableDataPtr->fallback;
	    }






<


<







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



|
>







 







|












|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



|
>







 







|












|







 







>
>
>
>
>
>







 







|







 







|







 







>
|







 







|







 







|







 







|







 







|







 







|







 







|







 







|







506
507
508
509
510
511
512

513
514

515
516
517
518
519
520
521
....
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
....
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
....
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
....
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
....
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
....
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
....
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
....
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
....
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
....
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
....
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
....
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
....
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
....
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
....
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
 *
 * Side effects:
 *	Depends on the memory, object, and IO subsystems.
 *
 *---------------------------------------------------------------------------
 */


/* Since TCL_ENCODING_MODIFIED is only used for utf-8/cesu-8 and
 * TCL_ENCODING_LE is only used for  utf-16/ucs-2. re-use the same value */

#define TCL_ENCODING_LE		TCL_ENCODING_MODIFIED	/* Little-endian encoding */
#define TCL_ENCODING_UTF	0x200	/* For UTF-8 encoding, allow 4-byte output sequences */

void
TclInitEncodingSubsystem(void)
{
    Tcl_EncodingType type;
................................................................................
				 * for the default system encoding. */
    const char *src,		/* Source string in specified encoding. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * encoding-specific string length. */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{
    Tcl_ExternalToUtfDStringEx(encoding, src, srcLen, 0, dstPtr);
    return Tcl_DStringValue(dstPtr);
}

 
/*
 *-------------------------------------------------------------------------
 *
 * Tcl_ExternalToUtfDStringEx --
 *
 *	Convert a source buffer from the specified encoding into UTF-8.
*	The parameter flags controls the behavior, if any of the bytes in
 *	the source buffer are invalid or cannot be represented in utf-8.
 *	Possible flags values:
 *	TCL_ENCODING_STOPONERROR: don't replace invalid characters/bytes but
 *	return the first error position (Default in Tcl 9.0).
 *	TCL_ENCODING_NO_THROW: replace invalid characters/bytes by a default
 *	fallback character. Always return -1 (Default in Tcl 8.7).
 *	TCL_ENCODING_MODIFIED: convert NULL bytes to \xC0\x80 in stead of 0x00.
 *	Only valid for "utf-8" and "cesu-8". This flag may be used together
 *	with the other flags.
 *
 * Results:
 *	The converted bytes are stored in the DString, which is then NULL
 *	terminated in an encoding-specific manner. The return value is
 *	the error position in the source string or -1 if no conversion error
 *	is reported.
  *
 * Side effects:
 *	None.
 *
 *-------------------------------------------------------------------------
 */

size_t
Tcl_ExternalToUtfDStringEx(
    Tcl_Encoding encoding,	/* The encoding for the source string, or NULL
				 * for the default system encoding. */
    const char *src,		/* Source string in specified encoding. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * encoding-specific string length. */
    int flags,			/* Conversion control flags. */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{
    char *dst;
    Tcl_EncodingState state;
    const Encoding *encodingPtr;
    int dstLen, result, soFar, srcRead, dstWrote, dstChars;
    const char *srcStart = src;

    Tcl_DStringInit(dstPtr);
    dst = Tcl_DStringValue(dstPtr);
    dstLen = dstPtr->spaceAvl - 1;

    if (encoding == NULL) {
	encoding = systemEncoding;
................................................................................

    if (src == NULL) {
	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = encodingPtr->lengthProc(src);
    }

    flags |= TCL_ENCODING_START | TCL_ENCODING_END;
    if (encodingPtr->toUtfProc == UtfToUtfProc) {
	flags |= TCL_ENCODING_MODIFIED | TCL_ENCODING_UTF;
    }

    while (1) {
	result = encodingPtr->toUtfProc(encodingPtr->clientData, src, srcLen,
		flags, &state, dst, dstLen, &srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    Tcl_DStringSetLength(dstPtr, soFar);
	    return (result == TCL_OK) ? (size_t)-1 : (size_t)(src - srcStart);
	}
	flags &= ~TCL_ENCODING_START;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}
	Tcl_DStringSetLength(dstPtr, 2 * Tcl_DStringLength(dstPtr) + 1);
................................................................................
				 * NULL for the default system encoding. */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * strlen(). */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{
    Tcl_UtfToExternalDStringEx(encoding, src, srcLen, 0, dstPtr);
    return Tcl_DStringValue(dstPtr);
}

 
/*
 *-------------------------------------------------------------------------
 *
 * Tcl_UtfToExternalDStringEx --
 *
 *	Convert a source buffer from UTF-8 to the specified encoding.
 *	The parameter flags controls the behavior, if any of the bytes in
 *	the source buffer are invalid or cannot be represented in the
 *	target encoding.
 *	Possible flags values:
 *	TCL_ENCODING_STOPONERROR: don't replace invalid characters/bytes but
 *	return the first error position (Default in Tcl 9.0).
 *	TCL_ENCODING_NO_THROW: replace invalid characters/bytes by a default
 *	fallback character. Always return -1 (Default in Tcl 8.7).
 *	TCL_ENCODING_MODIFIED: convert NULL bytes to \xC0\x80 in stead of 0x00.
 *	Only valid for "utf-8" and "cesu-8". This flag may be used together
 *	with the other flags.
 *
 * Results:
 *	The converted bytes are stored in the DString, which is then NULL
 *	terminated in an encoding-specific manner. The return value is
 *	the error position in the source string or -1 if no conversion error
 *	is reported.
 *
 * Side effects:
 *	None.
 *
 *-------------------------------------------------------------------------
 */

size_t
Tcl_UtfToExternalDStringEx(
    Tcl_Encoding encoding,	/* The encoding for the converted string, or
				 * NULL for the default system encoding. */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes, or < 0 for
				 * strlen(). */
    int flags,	/* Conversion control flags. */
    Tcl_DString *dstPtr)	/* Uninitialized or free DString in which the
				 * converted string is stored. */
{
    char *dst;
    Tcl_EncodingState state;
    const Encoding *encodingPtr;
    int dstLen, result, soFar, srcRead, dstWrote, dstChars;
    const char *srcStart = src;

    Tcl_DStringInit(dstPtr);
    dst = Tcl_DStringValue(dstPtr);
    dstLen = dstPtr->spaceAvl - 1;

    if (encoding == NULL) {
	encoding = systemEncoding;
................................................................................
    encodingPtr = (Encoding *) encoding;

    if (src == NULL) {
	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = strlen(src);
    }
    flags |= TCL_ENCODING_START | TCL_ENCODING_END;
    while (1) {
	result = encodingPtr->fromUtfProc(encodingPtr->clientData, src,
		srcLen, flags, &state, dst, dstLen,
		&srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    if (encodingPtr->nullSize == 2) {
		Tcl_DStringSetLength(dstPtr, soFar + 1);
	    }
	    Tcl_DStringSetLength(dstPtr, soFar);
	    return (result == TCL_OK) ? (size_t)-1 : (size_t)(src - srcStart);
	}

	flags &= ~TCL_ENCODING_START;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}
................................................................................
 *	Returns TCL_OK if conversion was successful.
 *
 * Side effects:
 *	None.
 *
 *-------------------------------------------------------------------------
 */

#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
#   define STOPONERROR !(flags & TCL_ENCODING_NO_THROW)
#else
#   define STOPONERROR (flags & TCL_ENCODING_STOPONERROR)
#endif

static int
UtfToUtfProc(
    ClientData clientData,	/* additional flags, e.g. TCL_ENCODING_MODIFIED */
    const char *src,		/* Source string in UTF-8. */
    int srcLen,			/* Source string length in bytes. */
    int flags,			/* Conversion control flags. */
................................................................................
	     * Always check before using TclUtfToUCS4. Not doing can so
	     * cause it run beyond the end of the buffer! If we happen such an
	     * incomplete char its bytes are made to represent themselves
	     * unless the user has explicitly asked to be told.
	     */

	    if (flags & TCL_ENCODING_MODIFIED) {
		if (STOPONERROR) {
		    result = TCL_CONVERT_MULTIBYTE;
		    break;
		}
		ch = UCHAR(*src++);
	    } else {
		char chbuf[2];
		chbuf[0] = UCHAR(*src++); chbuf[1] = 0;
................................................................................
		TclUtfToUCS4(chbuf, &ch);
	    }
	    dst += Tcl_UniCharToUtf(ch, dst);
	} else {
	    int low;
	    const char *saveSrc = src;
	    size_t len = TclUtfToUCS4(src, &ch);
	    if ((len < 2) && (ch != 0) && STOPONERROR
		    && (flags & TCL_ENCODING_MODIFIED)) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    src += len;
	    if (!(flags & TCL_ENCODING_UTF)) {
		if (ch > 0xFFFF) {
................................................................................
		 * A surrogate character is detected, handle especially.
		 */

		low = ch;
		len = (src <= srcEnd-3) ? TclUtfToUCS4(src, &low) : 0;

		if (((low & ~0x3FF) != 0xDC00) || (ch & 0x400)) {

		    if (STOPONERROR) {
			result = TCL_CONVERT_UNKNOWN;
			src = saveSrc;
			break;
		    }
		    if (!(flags & TCL_ENCODING_MODIFIED)) {
			ch = 0xFFFD;
		    }
................................................................................
		    *dst++ = (char) ((ch | 0x80) & 0xBF);
		    continue;
		}
		src += len;
		dst += Tcl_UniCharToUtf(ch, dst);
		ch = low;
	    } else if (!Tcl_UniCharIsUnicode(ch)) {
		if (STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    src = saveSrc;
		    break;
		}
		if (!(flags & TCL_ENCODING_MODIFIED)) {
		    ch = 0xFFFD;
		}
................................................................................
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
	}
	len = TclUtfToUCS4(src, &ch);
	if (!Tcl_UniCharIsUnicode(ch)) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    ch = 0xFFFD;
	}
	src += len;
	if (flags & TCL_ENCODING_LE) {
................................................................................
		break;
	    }
	    ch = toUnicode[byte][*((unsigned char *) src)];
	} else {
	    ch = pageZero[byte];
	}
	if ((ch == 0) && (byte != 0)) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    if (prefixBytes[byte]) {
		src--;
	    }
	    ch = (Tcl_UniChar) byte;
................................................................................
	if (!len) {
	    word = 0;
	} else
#endif
	    word = fromUnicode[(ch >> 8)][ch & 0xFF];

	if ((word == 0) && (ch != 0)) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    word = dataPtr->fallback;
	}
	if (prefixBytes[(word >> 8)] != 0) {
	    if (dst + 1 > dstEnd) {
................................................................................
	 */

	if (ch > 0xFF
#if TCL_UTF_MAX <= 3
		|| ((ch >= 0xD800) && (len < 3))
#endif
		) {
	    if (STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
#if TCL_UTF_MAX <= 3
	    if ((ch >= 0xD800) && (len < 3)) {
		len = 4;
	    }
................................................................................
	     * We have a split-up or unrecognized escape sequence. If we
	     * checked all the sequences, then it's a syntax error, otherwise
	     * we need more bytes to determine a match.
	     */

	    if ((checked == dataPtr->numSubTables + 2)
		    || (flags & TCL_ENCODING_END)) {
		if (!STOPONERROR) {
		    /*
		     * Skip the unknown escape sequence.
		     */

		    src += longest;
		    continue;
		}
................................................................................
		if (word != 0) {
		    break;
		}
	    }

	    if (word == 0) {
		state = oldState;
		if (STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    break;
		}
		encodingPtr = GetTableEncoding(dataPtr, state);
		tableDataPtr = (const TableEncodingData *)encodingPtr->clientData;
		word = tableDataPtr->fallback;
	    }

Changes to generic/tclStubInit.c.

1938
1939
1940
1941
1942
1943
1944


1945
1946
1947
    TclGetStringFromObj, /* 651 */
    TclGetUnicodeFromObj, /* 652 */
    TclGetByteArrayFromObj, /* 653 */
    Tcl_UtfCharComplete, /* 654 */
    Tcl_UtfNext, /* 655 */
    Tcl_UtfPrev, /* 656 */
    Tcl_UniCharIsUnicode, /* 657 */


};

/* !END!: Do not edit above this line. */






>
>



1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
    TclGetStringFromObj, /* 651 */
    TclGetUnicodeFromObj, /* 652 */
    TclGetByteArrayFromObj, /* 653 */
    Tcl_UtfCharComplete, /* 654 */
    Tcl_UtfNext, /* 655 */
    Tcl_UtfPrev, /* 656 */
    Tcl_UniCharIsUnicode, /* 657 */
    Tcl_ExternalToUtfDStringEx, /* 658 */
    Tcl_UtfToExternalDStringEx, /* 659 */
};

/* !END!: Do not edit above this line. */

Changes to tests/chanio.test.

14
15
16
17
18
19
20


21
22
23
24
25
26
27
...
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
...
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
# this file, and for a DISCLAIMER OF ALL WARRANTIES.

if {"::tcltest" ni [namespace children]} {
    package require tcltest 2.5
    namespace import -force ::tcltest::*
}



namespace eval ::tcl::test::io {

    if {"::tcltest" ni [namespace children]} {
	package require tcltest 2.5
	namespace import -force ::tcltest::*
    }

................................................................................
    set f [open $path(test1) w]
    chan configure $f -encoding ascii -buffering line -translation crlf
    chan puts -nonewline $f "\n12"
    contents $path(test1)
} -cleanup {
    chan close $f
} -result "\r\n12"
test chan-io-3.4 {WriteChars: loop over stage buffer} {
    # stage buffer maps to more than can be queued at once.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 16
    chan puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
    chan close $f
    lappend x [contents $path(test1)]
} [list "!)!)!)!)!)!)!)!)" "!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)"]
test chan-io-3.5 {WriteChars: saved != 0} {
    # Bytes produced by UtfToExternal from end of last channel buffer had to
    # be moved to beginning of next channel buffer to preserve requested
    # buffersize.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 17
    chan puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
................................................................................
    set f [open $path(test1) w]
    chan configure $f -encoding shiftjis -buffersize 16
    chan puts -nonewline $f "12345678901234AB"
    set x [list [contents $path(test1)]]
    chan close $f
    lappend x [contents $path(test1)]
} [list "12345678901234\x82\x60" "12345678901234\x82\x60\x82\x61"]
test chan-io-3.7 {WriteChars: (bufPtr->nextAdded > bufPtr->length)} {
    # When translating UTF-8 to external, the produced bytes went past end of
    # the channel buffer. This is done on purpose - we then truncate the bytes
    # at the end of the partial character to preserve the requested blocksize
    # on flush. The truncated bytes are moved to the beginning of the next
    # channel buffer.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 17






>
>







 







|








|







 







|







14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
...
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
...
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
# this file, and for a DISCLAIMER OF ALL WARRANTIES.

if {"::tcltest" ni [namespace children]} {
    package require tcltest 2.5
    namespace import -force ::tcltest::*
}

testConstraint nodep [info exists tcl_precision]

namespace eval ::tcl::test::io {

    if {"::tcltest" ni [namespace children]} {
	package require tcltest 2.5
	namespace import -force ::tcltest::*
    }

................................................................................
    set f [open $path(test1) w]
    chan configure $f -encoding ascii -buffering line -translation crlf
    chan puts -nonewline $f "\n12"
    contents $path(test1)
} -cleanup {
    chan close $f
} -result "\r\n12"
test chan-io-3.4 {WriteChars: loop over stage buffer} nodep {
    # stage buffer maps to more than can be queued at once.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 16
    chan puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
    chan close $f
    lappend x [contents $path(test1)]
} [list "!)!)!)!)!)!)!)!)" "!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)"]
test chan-io-3.5 {WriteChars: saved != 0} nodep {
    # Bytes produced by UtfToExternal from end of last channel buffer had to
    # be moved to beginning of next channel buffer to preserve requested
    # buffersize.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 17
    chan puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
................................................................................
    set f [open $path(test1) w]
    chan configure $f -encoding shiftjis -buffersize 16
    chan puts -nonewline $f "12345678901234AB"
    set x [list [contents $path(test1)]]
    chan close $f
    lappend x [contents $path(test1)]
} [list "12345678901234\x82\x60" "12345678901234\x82\x60\x82\x61"]
test chan-io-3.7 {WriteChars: (bufPtr->nextAdded > bufPtr->length)} nodep {
    # When translating UTF-8 to external, the produced bytes went past end of
    # the channel buffer. This is done on purpose - we then truncate the bytes
    # at the end of the partial character to preserve the requested blocksize
    # on flush. The truncated bytes are moved to the beginning of the next
    # channel buffer.
    set f [open $path(test1) w]
    chan configure $f -encoding jis0208 -buffersize 17

Changes to tests/cmdAH.test.

174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
...
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
    encoding
} -result {wrong # args: should be "encoding subcommand ?arg ...?"}
test cmdAH-4.2 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding foo
} -result {unknown or ambiguous subcommand "foo": must be convertfrom, convertto, dirs, names, or system}
test cmdAH-4.3 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto
} -result {wrong # args: should be "encoding convertto ?encoding? data"}
test cmdAH-4.4 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.5 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208
................................................................................
    encoding system iso8859-1
    encoding convertto jis0208 乎
} -cleanup {
    encoding system $system
} -result 8C
test cmdAH-4.7 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom
} -result {wrong # args: should be "encoding convertfrom ?encoding? data"}
test cmdAH-4.8 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.9 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208






|







 







|







174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
...
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
    encoding
} -result {wrong # args: should be "encoding subcommand ?arg ...?"}
test cmdAH-4.2 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding foo
} -result {unknown or ambiguous subcommand "foo": must be convertfrom, convertto, dirs, names, or system}
test cmdAH-4.3 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto
} -result {wrong # args: should be "encoding convertto ?-nothrow|-stoponerror? ?encoding? data"}
test cmdAH-4.4 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.5 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208
................................................................................
    encoding system iso8859-1
    encoding convertto jis0208 乎
} -cleanup {
    encoding system $system
} -result 8C
test cmdAH-4.7 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom
} -result {wrong # args: should be "encoding convertfrom ?-nothrow|-stoponerror? ?encoding? data"}
test cmdAH-4.8 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.9 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208

Changes to tests/encoding.test.

287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
...
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
...
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
...
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600






























601
602















603
604
605
606
607
608
609
...
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
test encoding-11.9 {encoding: extended Unicode UTF-16} {
    viewable [encoding convertto utf-16be 😹]
} {Ø=Þ9 (\u00D8=\u00DE9)}
# OpenEncodingFile is fully tested by the rest of the tests in this file.

test encoding-12.1 {LoadTableEncoding: normal encoding} {
    set x [encoding convertto iso8859-3 Ġ]
    append x [encoding convertto iso8859-3 Õ]
    append x [encoding convertfrom iso8859-3 Õ]
} "Õ?Ġ"
test encoding-12.2 {LoadTableEncoding: single-byte encoding} {
    set x [encoding convertto iso8859-3 abĠg]
    append x [encoding convertfrom iso8859-3 abÕg]
} "abÕgabĠg"
test encoding-12.3 {LoadTableEncoding: multi-byte encoding} {
................................................................................
test encoding-15.5 {UtfToUtfProc emoji character input} {
    set x \xF0\x9F\x98\x82
    set y [encoding convertfrom utf-8 \xF0\x9F\x98\x82]
    list [string length $x] $y
} "4 😂"
test encoding-15.6 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uDE02\uD83D
    set y [encoding convertto utf-8 \uDE02\uD83D\uDE02\uD83D]
    binary scan $y H* z
    list [string length $y] $z
} {10 efbfbdf09f9882efbfbd}
test encoding-15.7 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uD83D
    set y [encoding convertto utf-8 \uDE02\uD83D\uD83D]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 9 efbfbdefbfbdefbfbd}
test encoding-15.8 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83Dé
    set y [encoding convertto utf-8 \uDE02\uD83Dé]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 8 efbfbdefbfbdc3a9}
test encoding-15.9 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83DX
    set y [encoding convertto utf-8 \uDE02\uD83DX]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 7 efbfbdefbfbd58}
test encoding-15.10 {UtfToUtfProc high surrogate character output} {
    set x \uDE02é
    set y [encoding convertto utf-8 \uDE02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 efbfbdc3a9}
test encoding-15.11 {UtfToUtfProc low surrogate character output} {
    set x \uDA02é
    set y [encoding convertto utf-8 \uDA02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 efbfbdc3a9}
test encoding-15.12 {UtfToUtfProc high surrogate character output} {
    set x \uDE02Y
    set y [encoding convertto utf-8 \uDE02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 efbfbd59}
test encoding-15.13 {UtfToUtfProc low surrogate character output} {
    set x \uDA02Y
    set y [encoding convertto utf-8 \uDA02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 efbfbd59}
test encoding-15.14 {UtfToUtfProc high surrogate character output} {
    set x \uDE02
    set y [encoding convertto utf-8 \uDE02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 efbfbd}
test encoding-15.15 {UtfToUtfProc low surrogate character output} {
    set x \uDA02
    set y [encoding convertto utf-8 \uDA02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 efbfbd}
test encoding-15.16 {UtfToUtfProc: Invalid 4-byte UTF-8, see [ed29806ba]} {
    set x \xF0\xA0\xA1\xC2
    set y [encoding convertfrom utf-8 \xF0\xA0\xA1\xC2]
    list [string length $x] $y
} "4 \xF0\xA0\xA1\xC2"
test encoding-15.17 {UtfToUtfProc emoji character output} {
    set x 😂
    set y [encoding convertto utf-8 😂]
    binary scan $y H* z
    list [string length $y] $z
................................................................................
test encoding-17.1 {UtfToUtf16Proc} -body {
    encoding convertto utf-16 "\U460DC"
} -result "\xD8\xD8\xDC\xDC"
test encoding-17.2 {UtfToUcs2Proc} -body {
    encoding convertfrom utf-16 [encoding convertto ucs-2 "\U460DC"]
} -result "\uFFFD"
test encoding-17.3 {UtfToUtf16Proc} -body {
    encoding convertto utf-16be "\uDCDC"
} -result "\xFF\xFD"
test encoding-17.4 {UtfToUtf16Proc} -body {
    encoding convertto utf-16le "\uD8D8"
} -result "\xFD\xFF"

test encoding-18.1 {TableToUtfProc} {
} {}

test encoding-19.1 {TableFromUtfProc} {
} {}
................................................................................
    list $count [viewable $line]
} [list 3 "乎乞也 (\\u4E4E\\u4E5E\\u4E5F)"]

test encoding-24.4 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC0\x80"]
} 1
test encoding-24.5 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC0\x81"]
} 2
test encoding-24.6 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC1\xBF"]
} 2
test encoding-24.7 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC2\x80"]
} 1
test encoding-24.8 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xE0\x80\x80"]
} 3
test encoding-24.9 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xE0\x9F\xBF"]
} 3
test encoding-24.10 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xE0\xA0\x80"]
} 1
test encoding-24.11 {Parse valid or invalid utf-8} {






























    string length [encoding convertfrom utf-8 "\xEF\xBF\xBF"]
} 1
















file delete [file join [temporaryDirectory] iso2022.txt]

#
# Begin jajp encoding round-trip conformity tests
#
proc foreach-jisx0208 {varName command} {
................................................................................
}


test encoding-28.0 {all encodings load} -body {
	set string hello
	foreach name [encoding names] {
		incr count
		encoding convertto $name $string

		# discard the cached internal representation of Tcl_Encoding
		# Unfortunately, without this, encoding 2-1 fails.
		llength $name
	}
	return $count
} -result [expr {[info exists ::tcl_precision] ? 87 : 86}]






|







 







|





|





|





|





|





|





|





|





|





|





|







 







|


|







 







|


|





|


|





>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|







287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
...
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
...
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
...
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
...
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
test encoding-11.9 {encoding: extended Unicode UTF-16} {
    viewable [encoding convertto utf-16be 😹]
} {Ø=Þ9 (\u00D8=\u00DE9)}
# OpenEncodingFile is fully tested by the rest of the tests in this file.

test encoding-12.1 {LoadTableEncoding: normal encoding} {
    set x [encoding convertto iso8859-3 Ġ]
    append x [encoding convertto -nothrow iso8859-3 Õ]
    append x [encoding convertfrom iso8859-3 Õ]
} "Õ?Ġ"
test encoding-12.2 {LoadTableEncoding: single-byte encoding} {
    set x [encoding convertto iso8859-3 abĠg]
    append x [encoding convertfrom iso8859-3 abÕg]
} "abÕgabĠg"
test encoding-12.3 {LoadTableEncoding: multi-byte encoding} {
................................................................................
test encoding-15.5 {UtfToUtfProc emoji character input} {
    set x \xF0\x9F\x98\x82
    set y [encoding convertfrom utf-8 \xF0\x9F\x98\x82]
    list [string length $x] $y
} "4 😂"
test encoding-15.6 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uDE02\uD83D
    set y [encoding convertto -nothrow utf-8 \uDE02\uD83D\uDE02\uD83D]
    binary scan $y H* z
    list [string length $y] $z
} {10 efbfbdf09f9882efbfbd}
test encoding-15.7 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uD83D
    set y [encoding convertto -nothrow utf-8 \uDE02\uD83D\uD83D]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 9 efbfbdefbfbdefbfbd}
test encoding-15.8 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83Dé
    set y [encoding convertto -nothrow utf-8 \uDE02\uD83Dé]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 8 efbfbdefbfbdc3a9}
test encoding-15.9 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83DX
    set y [encoding convertto -nothrow utf-8 \uDE02\uD83DX]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 7 efbfbdefbfbd58}
test encoding-15.10 {UtfToUtfProc high surrogate character output} {
    set x \uDE02é
    set y [encoding convertto -nothrow utf-8 \uDE02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 efbfbdc3a9}
test encoding-15.11 {UtfToUtfProc low surrogate character output} {
    set x \uDA02é
    set y [encoding convertto -nothrow utf-8 \uDA02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 efbfbdc3a9}
test encoding-15.12 {UtfToUtfProc high surrogate character output} {
    set x \uDE02Y
    set y [encoding convertto -nothrow utf-8 \uDE02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 efbfbd59}
test encoding-15.13 {UtfToUtfProc low surrogate character output} {
    set x \uDA02Y
    set y [encoding convertto -nothrow utf-8 \uDA02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 efbfbd59}
test encoding-15.14 {UtfToUtfProc high surrogate character output} {
    set x \uDE02
    set y [encoding convertto -nothrow utf-8 \uDE02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 efbfbd}
test encoding-15.15 {UtfToUtfProc low surrogate character output} {
    set x \uDA02
    set y [encoding convertto -nothrow utf-8 \uDA02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 efbfbd}
test encoding-15.16 {UtfToUtfProc: Invalid 4-byte UTF-8, see [ed29806ba]} {
    set x \xF0\xA0\xA1\xC2
    set y [encoding convertfrom -nothrow utf-8 \xF0\xA0\xA1\xC2]
    list [string length $x] $y
} "4 \xF0\xA0\xA1\xC2"
test encoding-15.17 {UtfToUtfProc emoji character output} {
    set x 😂
    set y [encoding convertto utf-8 😂]
    binary scan $y H* z
    list [string length $y] $z
................................................................................
test encoding-17.1 {UtfToUtf16Proc} -body {
    encoding convertto utf-16 "\U460DC"
} -result "\xD8\xD8\xDC\xDC"
test encoding-17.2 {UtfToUcs2Proc} -body {
    encoding convertfrom utf-16 [encoding convertto ucs-2 "\U460DC"]
} -result "\uFFFD"
test encoding-17.3 {UtfToUtf16Proc} -body {
    encoding convertto -nothrow utf-16be "\uDCDC"
} -result "\xFF\xFD"
test encoding-17.4 {UtfToUtf16Proc} -body {
    encoding convertto -nothrow utf-16le "\uD8D8"
} -result "\xFD\xFF"

test encoding-18.1 {TableToUtfProc} {
} {}

test encoding-19.1 {TableFromUtfProc} {
} {}
................................................................................
    list $count [viewable $line]
} [list 3 "乎乞也 (\\u4E4E\\u4E5E\\u4E5F)"]

test encoding-24.4 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC0\x80"]
} 1
test encoding-24.5 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nothrow utf-8 "\xC0\x81"]
} 2
test encoding-24.6 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nothrow utf-8 "\xC1\xBF"]
} 2
test encoding-24.7 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xC2\x80"]
} 1
test encoding-24.8 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nothrow utf-8 "\xE0\x80\x80"]
} 3
test encoding-24.9 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nothrow utf-8 "\xE0\x9F\xBF"]
} 3
test encoding-24.10 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom utf-8 "\xE0\xA0\x80"]
} 1
test encoding-24.11 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -nothrow utf-8 "\xEF\xBF\xBF"]
} 1
test encoding-24.12 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -stoponerror utf-8 "\xC0\x80"]
} 1
test encoding-24.13 {Parse valid or invalid utf-8} -body {
    encoding convertfrom -stoponerror utf-8 "\xC0\x81"
} -returnCodes 1 -result {unexpected byte sequence starting at index 0: '\xC0'}
test encoding-24.14 {Parse valid or invalid utf-8} -body {
    encoding convertfrom -stoponerror utf-8 "\xC1\xBF"
} -returnCodes 1 -result {unexpected byte sequence starting at index 0: '\xC1'}
test encoding-24.15 {Parse valid or invalid utf-8} {
    string length [encoding convertfrom -stoponerror utf-8 "\xC2\x80"]
} 1
test encoding-24.16 {Parse valid or invalid utf-8} -body {
    encoding convertfrom -stoponerror utf-8 "Z\xE0\x80"
} -returnCodes 1 -result {unexpected byte sequence starting at index 1: '\xE0'}
test encoding-24.17 {Parse valid or invalid utf-8} -constraints testbytestring -body {
    encoding convertto -stoponerror utf-8 [testbytestring "Z\u4343\x80"]
} -returnCodes 1 -result {expected byte sequence but character 1 was '䍃€' (U+004343)}
test encoding-24.18 {Parse valid or invalid utf-8} -constraints testbytestring -body {
    encoding convertto -stoponerror utf-8 [testbytestring "Z\xE0\x80"]
} -result "Z\xC3\xA0\xE2\x82\xAC"
test encoding-24.19 {Parse valid or invalid utf-8} -constraints testbytestring -body {
    encoding convertto -stoponerror utf-8 [testbytestring "Z\xE0\x80xxxxxx"]
} -result "Z\xC3\xA0\xE2\x82\xACxxxxxx"
test encoding-24.20 {Parse valid or invalid utf-8} -body {
    encoding convertto -stoponerror utf-8 "ZX\uD800"
} -returnCodes 1 -match glob -result "unexpected character at index 2: 'U+00D800'"
test encoding-24.21 {Parse with -nothrow but without providing encoding} {
    string length [encoding convertfrom -nothrow "\x20"]
} 1
test encoding-24.22 {Parse with -nothrow but without providing encoding} {
    string length [encoding convertto -nothrow "\x20"]
} 1
test encoding-24.23 {Syntax error, two encodings} -body {
    encoding convertfrom iso8859-1 utf-8 "ZX\uD800"
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertfrom ?-nothrow|-stoponerror? ?encoding? data"}
test encoding-24.24 {Syntax error, two encodings} -body {
    encoding convertto iso8859-1 utf-8 "ZX\uD800"
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertto ?-nothrow|-stoponerror? ?encoding? data"}
test encoding-24.25 {Syntax error, two options} -body {
    encoding convertfrom -nothrow -stoponerror "ZX\uD800"
} -returnCodes 1 -result {unknown encoding "-stoponerror"}
test encoding-24.26 {Syntax error, two options} -body {
    encoding convertto -nothrow -stoponerror "ZX\uD800"
} -returnCodes 1 -result {unknown encoding "-stoponerror"}

file delete [file join [temporaryDirectory] iso2022.txt]

#
# Begin jajp encoding round-trip conformity tests
#
proc foreach-jisx0208 {varName command} {
................................................................................
}


test encoding-28.0 {all encodings load} -body {
	set string hello
	foreach name [encoding names] {
		incr count
		encoding convertto -nothrow $name $string

		# discard the cached internal representation of Tcl_Encoding
		# Unfortunately, without this, encoding 2-1 fails.
		llength $name
	}
	return $count
} -result [expr {[info exists ::tcl_precision] ? 87 : 86}]

Changes to tests/http.test.

27
28
29
30
31
32
33


34
35
36
37
38
39
40
...
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
	$interp eval [list set argv $argv]
	$interp eval [list source [info script]]
	interp delete $interp
	return
    }
}



proc bgerror {args} {
    global errorInfo
    puts stderr "http.test bgerror"
    puts stderr [join $args]
    puts stderr $errorInfo
}

................................................................................
} -returnCodes error -body {
    # this would be reverting to http <=2.4 behavior
    http::config -urlencoding ""
    http::mapReply "∈"
} -cleanup {
    http::config -urlencoding $enc
} -result "can't read \"formMap(∈)\": no such element in array"
test http-7.4 {http::formatQuery} -setup {
    set enc [http::config -urlencoding]
} -body {
    # this would be reverting to http <=2.4 behavior w/o errors
    # (unknown chars become '?')
    http::config -urlencoding "iso8859-1"
    http::mapReply "∈"
} -cleanup {






>
>







 







|







27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
...
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
	$interp eval [list set argv $argv]
	$interp eval [list source [info script]]
	interp delete $interp
	return
    }
}

testConstraint nodep [info exists tcl_precision]

proc bgerror {args} {
    global errorInfo
    puts stderr "http.test bgerror"
    puts stderr [join $args]
    puts stderr $errorInfo
}

................................................................................
} -returnCodes error -body {
    # this would be reverting to http <=2.4 behavior
    http::config -urlencoding ""
    http::mapReply "∈"
} -cleanup {
    http::config -urlencoding $enc
} -result "can't read \"formMap(∈)\": no such element in array"
test http-7.4 {http::formatQuery} -constraints nodep -setup {
    set enc [http::config -urlencoding]
} -body {
    # this would be reverting to http <=2.4 behavior w/o errors
    # (unknown chars become '?')
    http::config -urlencoding "iso8859-1"
    http::mapReply "∈"
} -cleanup {

Changes to tests/io.test.

44
45
46
47
48
49
50

51
52
53
54
55
56
57
...
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
...
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
....
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
testConstraint testobj		[llength [info commands testobj]]
testConstraint testservicemode  [llength [info commands testservicemode]]
# Some things fail under Windows in Continuous Integration systems for subtle
# reasons such as CI often running with elevated privileges in a container.
testConstraint notWinCI [expr {
    $::tcl_platform(platform) ne "windows" || ![info exists ::env(CI)]}]
testConstraint notOSX [expr {$::tcl_platform(os) ne "Darwin"}]


# You need a *very* special environment to do some tests.  In
# particular, many file systems do not support large-files...
testConstraint largefileSupport [expr {$::tcl_platform(os) ne "Darwin"}]

# some tests can only be run is umask is 2
# if "umask" cannot be run, the tests will be skipped.
................................................................................
    set f [open $path(test1) w]
    fconfigure $f -encoding ascii -buffering line -translation crlf
    puts -nonewline $f "\n12"
    set x [contents $path(test1)]
    close $f
    set x
} "\r\n12"
test io-3.4 {WriteChars: loop over stage buffer} {
    # stage buffer maps to more than can be queued at once.

    set f [open $path(test1) w]
    fconfigure $f -encoding jis0208 -buffersize 16
    puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
    close $f
    lappend x [contents $path(test1)]
} [list "!)!)!)!)!)!)!)!)" "!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)"]
test io-3.5 {WriteChars: saved != 0} {
    # Bytes produced by UtfToExternal from end of last channel buffer
    # had to be moved to beginning of next channel buffer to preserve
    # requested buffersize.

    set f [open $path(test1) w]
    fconfigure $f -encoding jis0208 -buffersize 17
    puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
................................................................................
    set f [open $path(test1) w]
    fconfigure $f -encoding shiftjis -buffersize 16
    puts -nonewline $f "12345678901234AB"
    set x [list [contents $path(test1)]]
    close $f
    lappend x [contents $path(test1)]
} [list "12345678901234\x82\x60" "12345678901234\x82\x60\x82\x61"]
test io-3.7 {WriteChars: (bufPtr->nextAdded > bufPtr->length)} {
    # When translating UTF-8 to external, the produced bytes went past end
    # of the channel buffer.  This is done purpose -- we then truncate the
    # bytes at the end of the partial character to preserve the requested
    # blocksize on flush.  The truncated bytes are moved to the beginning
    # of the next channel buffer.

    set f [open $path(test1) w]
................................................................................
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 10
    set in [read $f]
    close $f
    scan [string index $in end] %c
} 160
test io-12.9 {ReadChars: multibyte chars split} {
    set f [open $path(test1) w]
    fconfigure $f -translation binary
    puts -nonewline $f [string repeat a 9]\xC2
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 10
    set in [read $f]
    close $f
    scan [string index $in end] %c
} 194
test io-12.10 {ReadChars: multibyte chars split} {
    set f [open $path(test1) w]
    fconfigure $f -translation binary
    puts -nonewline $f [string repeat a 9]\xC2
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 11
    set in [read $f]






>







 







|









|







 







|







 







|










|







44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
...
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
...
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
....
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
testConstraint testobj		[llength [info commands testobj]]
testConstraint testservicemode  [llength [info commands testservicemode]]
# Some things fail under Windows in Continuous Integration systems for subtle
# reasons such as CI often running with elevated privileges in a container.
testConstraint notWinCI [expr {
    $::tcl_platform(platform) ne "windows" || ![info exists ::env(CI)]}]
testConstraint notOSX [expr {$::tcl_platform(os) ne "Darwin"}]
testConstraint nodep [info exists tcl_precision]

# You need a *very* special environment to do some tests.  In
# particular, many file systems do not support large-files...
testConstraint largefileSupport [expr {$::tcl_platform(os) ne "Darwin"}]

# some tests can only be run is umask is 2
# if "umask" cannot be run, the tests will be skipped.
................................................................................
    set f [open $path(test1) w]
    fconfigure $f -encoding ascii -buffering line -translation crlf
    puts -nonewline $f "\n12"
    set x [contents $path(test1)]
    close $f
    set x
} "\r\n12"
test io-3.4 {WriteChars: loop over stage buffer} nodep {
    # stage buffer maps to more than can be queued at once.

    set f [open $path(test1) w]
    fconfigure $f -encoding jis0208 -buffersize 16
    puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
    set x [list [contents $path(test1)]]
    close $f
    lappend x [contents $path(test1)]
} [list "!)!)!)!)!)!)!)!)" "!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)"]
test io-3.5 {WriteChars: saved != 0} nodep {
    # Bytes produced by UtfToExternal from end of last channel buffer
    # had to be moved to beginning of next channel buffer to preserve
    # requested buffersize.

    set f [open $path(test1) w]
    fconfigure $f -encoding jis0208 -buffersize 17
    puts -nonewline $f "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
................................................................................
    set f [open $path(test1) w]
    fconfigure $f -encoding shiftjis -buffersize 16
    puts -nonewline $f "12345678901234AB"
    set x [list [contents $path(test1)]]
    close $f
    lappend x [contents $path(test1)]
} [list "12345678901234\x82\x60" "12345678901234\x82\x60\x82\x61"]
test io-3.7 {WriteChars: (bufPtr->nextAdded > bufPtr->length)} nodep {
    # When translating UTF-8 to external, the produced bytes went past end
    # of the channel buffer.  This is done purpose -- we then truncate the
    # bytes at the end of the partial character to preserve the requested
    # blocksize on flush.  The truncated bytes are moved to the beginning
    # of the next channel buffer.

    set f [open $path(test1) w]
................................................................................
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 10
    set in [read $f]
    close $f
    scan [string index $in end] %c
} 160
test io-12.9 {ReadChars: multibyte chars split} nodep {
    set f [open $path(test1) w]
    fconfigure $f -translation binary
    puts -nonewline $f [string repeat a 9]\xC2
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 10
    set in [read $f]
    close $f
    scan [string index $in end] %c
} 194
test io-12.10 {ReadChars: multibyte chars split} nodep {
    set f [open $path(test1) w]
    fconfigure $f -translation binary
    puts -nonewline $f [string repeat a 9]\xC2
    close $f
    set f [open $path(test1)]
    fconfigure $f -encoding utf-8 -buffersize 11
    set in [read $f]

Changes to tests/main.test.

1
2
3
4
5
6
7


8
9
10
11
12
13
14
...
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# This file contains a collection of tests for generic/tclMain.c.

if {"::tcltest" ni [namespace children]} {
    package require tcltest 2.5
    namespace import -force ::tcltest::*
}



namespace eval ::tcl::test::main {
    namespace import ::tcltest::*

    # Is [exec] defined?
    testConstraint exec [llength [info commands exec]]

    # Is the tcl::test package loaded?
................................................................................
	close $f
	removeFile script
    } -result [list script {} 0]\n1\n

    test Tcl_Main-1.8 {
	Tcl_Main: startup script - -encoding option - mismatched encodings
    } -constraints {
	stdio
    } -setup {
	set script [makeFile {} script]
	file delete $script
	set f [open $script w]
	chan configure $f -encoding utf-8
	puts $f {puts [list $argv0 $argv $tcl_interactive]}
	puts -nonewline $f {puts [string equal \u20ac }






>
>







 







|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
...
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# This file contains a collection of tests for generic/tclMain.c.

if {"::tcltest" ni [namespace children]} {
    package require tcltest 2.5
    namespace import -force ::tcltest::*
}

testConstraint nodep [info exists tcl_precision]

namespace eval ::tcl::test::main {
    namespace import ::tcltest::*

    # Is [exec] defined?
    testConstraint exec [llength [info commands exec]]

    # Is the tcl::test package loaded?
................................................................................
	close $f
	removeFile script
    } -result [list script {} 0]\n1\n

    test Tcl_Main-1.8 {
	Tcl_Main: startup script - -encoding option - mismatched encodings
    } -constraints {
	stdio nodep
    } -setup {
	set script [makeFile {} script]
	file delete $script
	set f [open $script w]
	chan configure $f -encoding utf-8
	puts $f {puts [list $argv0 $argv $tcl_interactive]}
	puts -nonewline $f {puts [string equal \u20ac }

Changes to tests/safe.test.

1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
} -result foobar
test safe-11.7 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertfrom
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?encoding? data"}
test safe-11.7.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertfrom} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?encoding? data"
    while executing
"encoding convertfrom"
    invoked from within
"encoding convertfrom"
    invoked from within
"interp eval $i encoding convertfrom"}
test safe-11.8 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertto
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?encoding? data"}
test safe-11.8.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertto} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?encoding? data"
    while executing
"encoding convertto"
    invoked from within
"encoding convertto"
    invoked from within
"interp eval $i encoding convertto"}







|








|












|








|







1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
} -result foobar
test safe-11.7 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertfrom
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?-nothrow|-stoponerror? ?encoding? data"}
test safe-11.7.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertfrom} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?-nothrow|-stoponerror? ?encoding? data"
    while executing
"encoding convertfrom"
    invoked from within
"encoding convertfrom"
    invoked from within
"interp eval $i encoding convertfrom"}
test safe-11.8 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertto
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?-nothrow|-stoponerror? ?encoding? data"}
test safe-11.8.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertto} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?-nothrow|-stoponerror? ?encoding? data"
    while executing
"encoding convertto"
    invoked from within
"encoding convertto"
    invoked from within
"interp eval $i encoding convertto"}

Changes to tests/source.test.

16
17
18
19
20
21
22


23
24
25
26
27
28
29
...
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
    puts stderr "Skipping tests in [info script]. tcltest 2.5 required."
    return
}

namespace eval ::tcl::test::source {
    namespace import ::tcltest::*
 


test source-1.1 {source command} -setup {
    set x "old x value"
    set y "old y value"
    set z "old z value"
    set sourcefile [makeFile {
	set x 22
	set y 33
................................................................................
} -body {
    source -encoding utf-8 $sourcefile
    €
} -cleanup {
    removeFile source.file
    rename € {}
} -result foo
test source-7.6 {source -encoding: mismatch encoding error} -setup {
    set sourcefile [makeFile {} source.file]
    file delete $sourcefile
    set f [open $sourcefile w]
    fconfigure $f -encoding utf-8
    puts $f "proc € {} {return foo}"
    close $f
} -body {






>
>







 







|







16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
...
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
    puts stderr "Skipping tests in [info script]. tcltest 2.5 required."
    return
}

namespace eval ::tcl::test::source {
    namespace import ::tcltest::*
 
testConstraint nodep [info exists tcl_precision]

test source-1.1 {source command} -setup {
    set x "old x value"
    set y "old y value"
    set z "old z value"
    set sourcefile [makeFile {
	set x 22
	set y 33
................................................................................
} -body {
    source -encoding utf-8 $sourcefile
    €
} -cleanup {
    removeFile source.file
    rename € {}
} -result foo
test source-7.6 {source -encoding: mismatch encoding error} -constraints nodep -setup {
    set sourcefile [makeFile {} source.file]
    file delete $sourcefile
    set f [open $sourcefile w]
    fconfigure $f -encoding utf-8
    puts $f "proc € {} {return foo}"
    close $f
} -body {