Tcl Source Code

Changes On Branch tip607-encoding-failindex
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Changes In Branch tip607-encoding-failindex Excluding Merge-Ins

This is equivalent to a diff from e79940819f to 61b6543b0f

2022-07-02
19:39
TIP #607: -failindex option for encoding convertto/convertfrom check-in: 2c4329d1a2 user: jan.nijtmans tags: core-8-branch
2022-06-19
20:23
Fix TIP #312 documentation. Backported from [9c4dcc7347e8ef51] check-in: eee192b777 user: jan.nijtmans tags: core-8-branch
05:42
Create new branch named "win-console" check-in: c24c8ab617 user: apnadkarni tags: win-console
2022-06-15
20:49
Merge 8.7 Closed-Leaf check-in: 61b6543b0f user: jan.nijtmans tags: tip607-encoding-failindex
15:12
(experimental) TclOO > 2**31 args check-in: 39a1dece8e user: jan.nijtmans tags: tcloo-64bit
12:24
Merge 8.7 check-in: 339b37bab3 user: jan.nijtmans tags: tip-627
12:17
Merge-mark check-in: 57690dd1e0 user: jan.nijtmans tags: trunk, main
12:15
Add comments in tcl.decls check-in: e79940819f user: jan.nijtmans tags: core-8-branch
10:04
Backport tclStubLib.c from Tcl 9.0 (they should be equal in 8.7 and 9.0) check-in: df78bed29c user: jan.nijtmans tags: core-8-branch
2022-03-21
14:42
Merge 8.7. Renumber testcases in cmdAH.test. check-in: affc202d27 user: jan.nijtmans tags: tip607-encoding-failindex

Changes to doc/encoding.n.

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

38
39
40
41
42
43
44













45

46
47
48
49
50
51
52
53















54
55
56
57
58
59
60
.SH NAME
encoding \- Manipulate encodings
.SH SYNOPSIS
\fBencoding \fIoption\fR ?\fIarg arg ...\fR?
.BE
.SH INTRODUCTION
.PP
Strings in Tcl are logically a sequence of 16-bit Unicode characters.
These strings are represented in memory as a sequence of bytes that
may be in one of several encodings: modified UTF\-8 (which uses 1 to 3
bytes per character), 16-bit
.QW Unicode
(which uses 2 bytes per character, with an endianness that is
dependent on the host architecture), and binary (which uses a single
byte per character but only handles a restricted range of characters).
Tcl does not guarantee to always use the same encoding for the same
string.
.PP
Different operating system interfaces or applications may generate
strings in other encodings such as Shift\-JIS.  The \fBencoding\fR
command helps to bridge the gap between Unicode and these other
formats.
.SH DESCRIPTION
.PP
Performs one of several encoding related operations, depending on
\fIoption\fR.  The legal \fIoption\fRs are:
.TP
\fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR

.
Convert \fIdata\fR to Unicode from the specified \fIencoding\fR.  The
characters in \fIdata\fR are treated as binary data where the lower
8-bits of each character is taken as a single byte.  The resulting
sequence of bytes is treated as a string in the specified
\fIencoding\fR.  If \fIencoding\fR is not specified, the current
system encoding is used.













.TP

\fBencoding convertto\fR ?\fIencoding\fR? \fIstring\fR
.
Convert \fIstring\fR from Unicode to the specified \fIencoding\fR.
The result is a sequence of bytes that represents the converted
string.  Each byte is stored in the lower 8-bits of a Unicode
character (indeed, the resulting string is a binary string as far as
Tcl is concerned, at least initially).  If \fIencoding\fR is not
specified, the current system encoding is used.















.TP
\fBencoding dirs\fR ?\fIdirectoryList\fR?
.
Tcl can load encoding data files from the file system that describe
additional encodings for it to work with. This command sets the search
path for \fB*.enc\fR encoding data files to the list of directories
\fIdirectoryList\fR. If \fIdirectoryList\fR is omitted then the







|

|
|
<
<
<
<
<
<










|
>

|
|
<
|
|

>
>
>
>
>
>
>
>
>
>
>
>
>

>
|







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







10
11
12
13
14
15
16
17
18
19
20






21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
.SH NAME
encoding \- Manipulate encodings
.SH SYNOPSIS
\fBencoding \fIoption\fR ?\fIarg arg ...\fR?
.BE
.SH INTRODUCTION
.PP
Strings in Tcl are logically a sequence of Unicode characters.
These strings are represented in memory as a sequence of bytes that
may be in one of several encodings: modified UTF\-8 (which uses 1 to 4
bytes per character), or a custom encoding start as 8 bit binary data.






.PP
Different operating system interfaces or applications may generate
strings in other encodings such as Shift\-JIS.  The \fBencoding\fR
command helps to bridge the gap between Unicode and these other
formats.
.SH DESCRIPTION
.PP
Performs one of several encoding related operations, depending on
\fIoption\fR.  The legal \fIoption\fRs are:
.TP
\fBencoding convertfrom\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR?
?\fIencoding\fR? \fIdata\fR
.
Convert \fIdata\fR to a Unicode string from the specified \fIencoding\fR.  The
characters in \fIdata\fR are 8 bit binary data.  The resulting

sequence of bytes is a string created by applying the given \fIencoding\fR
to the data. If \fIencoding\fR is not specified, the current
system encoding is used.
.
The call fails on convertion errors, like an incomplete utf-8 sequence.
The option \fB-failindex\fR is followed by a variable name. The variable
is set to \fI-1\fR if no conversion error occured. It is set to the
first error location in \fIdata\fR in case of a conversion error. All data
until this error location is transformed and retured. This option may not
be used together with \fB-nocomplain\fR.
.
The call does not fail on conversion errors, if the option
\fB-nocomplain\fR is given. In this case, any error locations are replaced
by \fB?\fR. Incomplete sequences are written verbatim to the output string.
The purpose of this switch is to gain compatibility to prior versions of TCL.
It is not recommended for any other usage.
.TP
\fBencoding convertto\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR?
?\fIencoding\fR? \fIstring\fR
.
Convert \fIstring\fR from Unicode to the specified \fIencoding\fR.
The result is a sequence of bytes that represents the converted
string.  Each byte is stored in the lower 8-bits of a Unicode
character (indeed, the resulting string is a binary string as far as
Tcl is concerned, at least initially).  If \fIencoding\fR is not
specified, the current system encoding is used.
.
The call fails on convertion errors, like a Unicode character not representable
in the given \fIencoding\fR.
.
The option \fB-failindex\fR is followed by a variable name. The variable
is set to \fI-1\fR if no conversion error occured. It is set to the
first error location in \fIdata\fR in case of a conversion error. All data
until this error location is transformed and retured. This option may not
be used together with \fB-nocomplain\fR.
.
The call does not fail on conversion errors, if the option
\fB-nocomplain\fR is given. In this case, any error locations are replaced
by \fB?\fR. Incomplete sequences are written verbatim to the output string.
The purpose of this switch is to gain compatibility to prior versions of TCL.
It is not recommended for any other usage.
.TP
\fBencoding dirs\fR ?\fIdirectoryList\fR?
.
Tcl can load encoding data files from the file system that describe
additional encodings for it to work with. This command sets the search
path for \fB*.enc\fR encoding data files to the list of directories
\fIdirectoryList\fR. If \fIdirectoryList\fR is omitted then the
86
87
88
89
90
91
92




















93
94
95
96
97
98
99
.CS
set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"]
.CE
.PP
The result is the unicode codepoint:
.QW "\eu306F" ,
which is the Hiragana letter HA.




















.SH "SEE ALSO"
Tcl_GetEncoding(3)
.SH KEYWORDS
encoding, unicode
.\" Local Variables:
.\" mode: nroff
.\" End:







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
.CS
set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"]
.CE
.PP
The result is the unicode codepoint:
.QW "\eu306F" ,
which is the Hiragana letter HA.
.PP
The following example detects the error location in an incomplete UTF-8 sequence:
.PP
.CS
% set s [\fBencoding convertfrom\fR -failindex i utf-8 "A\xc3"]
A
% set i
1
.CE
.PP
The following example detects the error location while transforming to ISO8859-1
(ISO-Latin 1):
.PP
.CS
% set s [\fBencoding convertto\fR -failindex i utf-8 "A\u0141"]
A
% set i
1
.CE
.PP
.SH "SEE ALSO"
Tcl_GetEncoding(3)
.SH KEYWORDS
encoding, unicode
.\" Local Variables:
.\" mode: nroff
.\" End:

Changes to generic/tclCmdAH.c.

552
553
554
555
556
557
558











559
560
561
562
563

564
565
566
567
568




569








570
571
572
573

574
575
576
577
578
579
580


581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603





604
605
606
607
608
609
610
611





612
613
614
615
616
617
618
    const char *bytesPtr;	/* Pointer to the first byte of the array */
#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
    int flags = TCL_ENCODING_STOPONERROR;
#else
    int flags = TCL_ENCODING_NOCOMPLAIN;
#endif
    int result;












    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];
    } else if ((unsigned)(objc - 2) < 3) {

	data = objv[objc - 1];
	bytesPtr = Tcl_GetString(objv[1]);
	if (bytesPtr[0] == '-' && bytesPtr[1] == 'n'
		&& !strncmp(bytesPtr, "-nocomplain", strlen(bytesPtr))) {
	    flags = TCL_ENCODING_NOCOMPLAIN;




	} else if (objc < 4) {








	    if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
		return TCL_ERROR;
	    }
	    goto encConvFromOK;

	} else {
	    goto encConvFromError;
	}
	if (objc < 4) {
	    encoding = Tcl_GetEncoding(interp, NULL);
	} else if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
	    return TCL_ERROR;


	}
    } else {
    encConvFromError:
	Tcl_WrongNumArgs(interp, 1, objv, "?-nocomplain? ?encoding? data");
	return TCL_ERROR;
    }

encConvFromOK:
    /*
     * Convert the string into a byte array in 'ds'
     */
#if !defined(TCL_NO_DEPRECATED) && (TCL_MAJOR_VERSION < 9)
    if (!(flags & TCL_ENCODING_STOPONERROR)) {
	bytesPtr = (char *) Tcl_GetByteArrayFromObj(data, &length);
    } else
#endif
    bytesPtr = (char *) TclGetBytesFromObj(interp, data, &length);
    if (bytesPtr == NULL) {
	return TCL_ERROR;
    }
    result = Tcl_ExternalToUtfDStringEx(encoding, bytesPtr, length,
	    flags, &ds);
    if (!(flags & TCL_ENCODING_NOCOMPLAIN) && (result != TCL_INDEX_NONE)) {





	char buf[TCL_INTEGER_SPACE];
	sprintf(buf, "%u", result);
	Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected byte sequence starting at index %"
		"u: '\\x%X'", result, UCHAR(bytesPtr[result])));
	Tcl_SetErrorCode(interp, "TCL", "ENCODING", "ILLEGALSEQUENCE",
		buf, NULL);
	Tcl_DStringFree(&ds);
	return TCL_ERROR;





    }

    /*
     * Note that we cannot use Tcl_DStringResult here because it will
     * truncate the string at the first null byte.
     */








>
>
>
>
>
>
>
>
>
>
>




|
>





>
>
>
>
|
>
>
>
>
>
>
>
>
|
|
|
<
>
|
<
<
<
|
<
|
>
>



|



<















>
>
>
>
>
|
|
|
|
|
|
|
|
>
>
>
>
>







552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596

597
598



599

600
601
602
603
604
605
606
607
608
609

610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
    const char *bytesPtr;	/* Pointer to the first byte of the array */
#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
    int flags = TCL_ENCODING_STOPONERROR;
#else
    int flags = TCL_ENCODING_NOCOMPLAIN;
#endif
    int result;
    Tcl_Obj *failVarObj = NULL;
    /*
     * Decode parameters:
     * Possible combinations:
     * 1) data						-> objc = 2
     * 2) encoding data					-> objc = 3
     * 3) -nocomplain data				-> objc = 3
     * 4) -nocomplain encoding data			-> objc = 4
     * 5) -failindex val data				-> objc = 4
     * 6) -failindex val encoding data			-> objc = 5
     */

    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];
    } else if (objc > 2 && objc < 6) {
	int objcUnprocessed = objc;
	data = objv[objc - 1];
	bytesPtr = Tcl_GetString(objv[1]);
	if (bytesPtr[0] == '-' && bytesPtr[1] == 'n'
		&& !strncmp(bytesPtr, "-nocomplain", strlen(bytesPtr))) {
	    flags = TCL_ENCODING_NOCOMPLAIN;
	    objcUnprocessed--;
	} else if (bytesPtr[0] == '-' && bytesPtr[1] == 'f'
		&& !strncmp(bytesPtr, "-failindex", strlen(bytesPtr))) {
	    /* at least two additional arguments needed */
	    if (objc < 4) {
		goto encConvFromError;
	    }
	    failVarObj = objv[2];
	    flags = TCL_ENCODING_STOPONERROR;
	    objcUnprocessed -= 2;
	}
	switch (objcUnprocessed) {
	    case 3:
		if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
		    return TCL_ERROR;
		}

		break;
	    case 2:



		encoding = Tcl_GetEncoding(interp, NULL);

		break;
	    default:
		goto encConvFromError;
	}
    } else {
    encConvFromError:
	Tcl_WrongNumArgs(interp, 1, objv, "?-nocomplain? ?-failindex var? ?encoding? data");
	return TCL_ERROR;
    }


    /*
     * Convert the string into a byte array in 'ds'
     */
#if !defined(TCL_NO_DEPRECATED) && (TCL_MAJOR_VERSION < 9)
    if (!(flags & TCL_ENCODING_STOPONERROR)) {
	bytesPtr = (char *) Tcl_GetByteArrayFromObj(data, &length);
    } else
#endif
    bytesPtr = (char *) TclGetBytesFromObj(interp, data, &length);
    if (bytesPtr == NULL) {
	return TCL_ERROR;
    }
    result = Tcl_ExternalToUtfDStringEx(encoding, bytesPtr, length,
	    flags, &ds);
    if (!(flags & TCL_ENCODING_NOCOMPLAIN) && (result != TCL_INDEX_NONE)) {
	if (failVarObj != NULL) {
	    if (Tcl_ObjSetVar2(interp, failVarObj, NULL, Tcl_NewWideIntObj(result), TCL_LEAVE_ERR_MSG) == NULL) {
		return TCL_ERROR;
	    }
	} else {
	    char buf[TCL_INTEGER_SPACE];
	    sprintf(buf, "%u", result);
	    Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected byte sequence starting at index %"
		    "u: '\\x%X'", result, UCHAR(bytesPtr[result])));
	    Tcl_SetErrorCode(interp, "TCL", "ENCODING", "ILLEGALSEQUENCE",
		    buf, NULL);
	    Tcl_DStringFree(&ds);
	    return TCL_ERROR;
	}
    } else if (failVarObj != NULL) {
	if (Tcl_ObjSetVar2(interp, failVarObj, NULL, Tcl_NewIntObj(-1), TCL_LEAVE_ERR_MSG) == NULL) {
	    return TCL_ERROR;
	}
    }

    /*
     * Note that we cannot use Tcl_DStringResult here because it will
     * truncate the string at the first null byte.
     */

655
656
657
658
659
660
661












662
663
664
665
666

667
668
669
670
671




672








673
674
675
676

677
678
679
680
681
682
683


684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699






700
701
702
703
704
705
706
707
708
709
710





711
712
713
714
715
716
717
    const char *stringPtr;	/* Pointer to the first byte of the string */
    int result;
#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
    int flags = TCL_ENCODING_STOPONERROR;
#else
    int flags = TCL_ENCODING_NOCOMPLAIN;
#endif













    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];
    } else if ((unsigned)(objc - 2) < 3) {

	data = objv[objc - 1];
	stringPtr = Tcl_GetString(objv[1]);
	if (stringPtr[0] == '-' && stringPtr[1] == 'n'
		&& !strncmp(stringPtr, "-nocomplain", strlen(stringPtr))) {
	    flags = TCL_ENCODING_NOCOMPLAIN;




	} else if (objc < 4) {








	    if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
		return TCL_ERROR;
	    }
	    goto encConvToOK;

	} else {
	    goto encConvToError;
	}
	if (objc < 4) {
	    encoding = Tcl_GetEncoding(interp, NULL);
	} else if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
	    return TCL_ERROR;


	}
    } else {
    encConvToError:
	Tcl_WrongNumArgs(interp, 1, objv, "?-nocomplain? ?encoding? data");
	return TCL_ERROR;
    }

encConvToOK:
    /*
     * Convert the string to a byte array in 'ds'
     */

    stringPtr = TclGetStringFromObj(data, &length);
    result = Tcl_UtfToExternalDStringEx(encoding, stringPtr, length,
	    flags, &ds);
    if (!(flags & TCL_ENCODING_NOCOMPLAIN) && (result != TCL_INDEX_NONE)) {






	int pos = Tcl_NumUtfChars(stringPtr, result);
	int ucs4;
	char buf[TCL_INTEGER_SPACE];
	TclUtfToUCS4(&stringPtr[result], &ucs4);
	sprintf(buf, "%u", result);
	Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected character at index %"
		"u: 'U+%06X'", pos, ucs4));
	Tcl_SetErrorCode(interp, "TCL", "ENCODING", "ILLEGALSEQUENCE",
		buf, NULL);
	Tcl_DStringFree(&ds);
	return TCL_ERROR;





    }
    Tcl_SetObjResult(interp,
		     Tcl_NewByteArrayObj((unsigned char*) Tcl_DStringValue(&ds),
					 Tcl_DStringLength(&ds)));
    Tcl_DStringFree(&ds);

    /*







>
>
>
>
>
>
>
>
>
>
>
>




|
>





>
>
>
>
|
>
>
>
>
>
>
>
>
|
|
|
<
>
|
<
<
<
|
<
|
>
>



|



<








>
>
>
>
>
>
|
|
|
|
|
|
|
|
|
|
|
>
>
>
>
>







686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731

732
733



734

735
736
737
738
739
740
741
742
743
744

745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
    const char *stringPtr;	/* Pointer to the first byte of the string */
    int result;
#if TCL_MAJOR_VERSION > 8 || defined(TCL_NO_DEPRECATED)
    int flags = TCL_ENCODING_STOPONERROR;
#else
    int flags = TCL_ENCODING_NOCOMPLAIN;
#endif
    Tcl_Obj *failVarObj = NULL;

    /*
     * Decode parameters:
     * Possible combinations:
     * 1) data						-> objc = 2
     * 2) encoding data					-> objc = 3
     * 3) -nocomplain data				-> objc = 3
     * 4) -nocomplain encoding data			-> objc = 4
     * 5) -failindex val data				-> objc = 4
     * 6) -failindex val encoding data			-> objc = 5
     */

    if (objc == 2) {
	encoding = Tcl_GetEncoding(interp, NULL);
	data = objv[1];
    } else if (objc > 2 && objc < 6) {
	int objcUnprocessed = objc;
	data = objv[objc - 1];
	stringPtr = Tcl_GetString(objv[1]);
	if (stringPtr[0] == '-' && stringPtr[1] == 'n'
		&& !strncmp(stringPtr, "-nocomplain", strlen(stringPtr))) {
	    flags = TCL_ENCODING_NOCOMPLAIN;
	    objcUnprocessed--;
	} else if (stringPtr[0] == '-' && stringPtr[1] == 'f'
		&& !strncmp(stringPtr, "-failindex", strlen(stringPtr))) {
	    /* at least two additional arguments needed */
	    if (objc < 4) {
		goto encConvToError;
	    }
	    failVarObj = objv[2];
	    flags = TCL_ENCODING_STOPONERROR;
	    objcUnprocessed -= 2;
	}
	switch (objcUnprocessed) {
	    case 3:
		if (Tcl_GetEncodingFromObj(interp, objv[objc - 2], &encoding) != TCL_OK) {
		    return TCL_ERROR;
		}

		break;
	    case 2:



		encoding = Tcl_GetEncoding(interp, NULL);

		break;
	    default:
		goto encConvToError;
	}
    } else {
    encConvToError:
	Tcl_WrongNumArgs(interp, 1, objv, "?-nocomplain? ?-failindex var? ?encoding? data");
	return TCL_ERROR;
    }


    /*
     * Convert the string to a byte array in 'ds'
     */

    stringPtr = TclGetStringFromObj(data, &length);
    result = Tcl_UtfToExternalDStringEx(encoding, stringPtr, length,
	    flags, &ds);
    if (!(flags & TCL_ENCODING_NOCOMPLAIN) && (result != TCL_INDEX_NONE)) {
	if (failVarObj != NULL) {
	    /* I hope, wide int will cover size_t data type */
	    if (Tcl_ObjSetVar2(interp, failVarObj, NULL, Tcl_NewWideIntObj(result), TCL_LEAVE_ERR_MSG) == NULL) {
		return TCL_ERROR;
	    }
	} else {
	    size_t pos = Tcl_NumUtfChars(stringPtr, result);
	    int ucs4;
	    char buf[TCL_INTEGER_SPACE];
	    TclUtfToUCS4(&stringPtr[result], &ucs4);
	    sprintf(buf, "%u", result);
	    Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected character at index %"
		    TCL_Z_MODIFIER "u: 'U+%06X'", pos, ucs4));
	    Tcl_SetErrorCode(interp, "TCL", "ENCODING", "ILLEGALSEQUENCE",
		    buf, NULL);
	    Tcl_DStringFree(&ds);
	    return TCL_ERROR;
	}
    } else if (failVarObj != NULL) {
	if (Tcl_ObjSetVar2(interp, failVarObj, NULL, Tcl_NewIntObj(-1), TCL_LEAVE_ERR_MSG) == NULL) {
	    return TCL_ERROR;
	}
    }
    Tcl_SetObjResult(interp,
		     Tcl_NewByteArrayObj((unsigned char*) Tcl_DStringValue(&ds),
					 Tcl_DStringLength(&ds)));
    Tcl_DStringFree(&ds);

    /*

Changes to tests/cmdAH.test.

174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
    encoding
} -result {wrong # args: should be "encoding subcommand ?arg ...?"}
test cmdAH-4.2 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding foo
} -result {unknown or ambiguous subcommand "foo": must be convertfrom, convertto, dirs, names, or system}
test cmdAH-4.3 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?encoding? data"}
test cmdAH-4.4 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.5 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208







|







174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
    encoding
} -result {wrong # args: should be "encoding subcommand ?arg ...?"}
test cmdAH-4.2 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding foo
} -result {unknown or ambiguous subcommand "foo": must be convertfrom, convertto, dirs, names, or system}
test cmdAH-4.3 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.4 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertto foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.5 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
    encoding system iso8859-1
    encoding convertto jis0208 乎
} -cleanup {
    encoding system $system
} -result 8C
test cmdAH-4.7 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?encoding? data"}
test cmdAH-4.8 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.9 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208







|







196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
    encoding system iso8859-1
    encoding convertto jis0208 乎
} -cleanup {
    encoding system $system
} -result 8C
test cmdAH-4.7 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.8 {Tcl_EncodingObjCmd} -returnCodes error -body {
    encoding convertfrom foo bar
} -result {unknown encoding "foo"}
test cmdAH-4.9 {Tcl_EncodingObjCmd} -setup {
    set system [encoding system]
} -body {
    encoding system jis0208
230
231
232
233
234
235
236



















































































































237
238
239
240
241
242
243
    set system [encoding system]
} -body {
    encoding system iso8859-1
    encoding system
} -cleanup {
    encoding system $system
} -result iso8859-1




















































































































test cmdAH-5.1 {Tcl_FileObjCmd} -returnCodes error -body {
    file
} -result {wrong # args: should be "file subcommand ?arg ...?"}
test cmdAH-5.2 {Tcl_FileObjCmd} -returnCodes error -body {
    file x
} -result {unknown or ambiguous subcommand "x": must be atime, attributes, channels, copy, delete, dirname, executable, exists, extension, isdirectory, isfile, join, link, lstat, mkdir, mtime, nativename, normalize, owned, pathtype, readable, readlink, rename, rootname, separator, size, split, stat, system, tail, tempdir, tempfile, type, volumes, or writable}







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
    set system [encoding system]
} -body {
    encoding system iso8859-1
    encoding system
} -cleanup {
    encoding system $system
} -result iso8859-1

test cmdAH-4.14.1 {Syntax error, -nocomplain and -failindex, no encoding} -body {
    encoding convertfrom -nocomplain -failindex 2 ABC
} -returnCodes 1 -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.14.2 {Syntax error, -nocomplain and -failindex, no encoding} -body {
    encoding convertto -nocomplain -failindex 2 ABC
} -returnCodes 1 -result {wrong # args: should be "encoding convertto ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.15.1 {Syntax error, -failindex and -nocomplain, no encoding} -body {
    encoding convertfrom -failindex 2 -nocomplain ABC
} -returnCodes 1 -result {unknown encoding "-nocomplain"}
test cmdAH-4.15.2 {Syntax error, -failindex and -nocomplain, no encoding} -body {
    encoding convertto -failindex 2 -nocomplain ABC
} -returnCodes 1 -result {unknown encoding "-nocomplain"}
test cmdAH-4.16.1 {Syntax error, -nocomplain and -failindex, encoding} -body {
    encoding convertfrom -nocomplain -failindex 2 utf-8 ABC
} -returnCodes 1 -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.16.2 {Syntax error, -nocomplain and -failindex, encoding} -body {
    encoding convertto -nocomplain -failindex 2 utf-8 ABC
} -returnCodes 1 -result {wrong # args: should be "encoding convertto ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.17.1 {Syntax error, -failindex and -nocomplain, encoding} -body {
    encoding convertfrom -failindex 2 -nocomplain utf-8 ABC
} -returnCodes 1 -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.17.2 {Syntax error, -failindex and -nocomplain, encoding} -body {
    encoding convertto -failindex 2 -nocomplain utf-8 ABC
} -returnCodes 1 -result {wrong # args: should be "encoding convertto ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.18.1 {Syntax error, -failindex with no var, no encoding} -body {
    encoding convertfrom -failindex ABC
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.18.2 {Syntax error, -failindex with no var, no encoding (byte compiled)} -setup {
    proc encoding_test {} {
        encoding convertfrom -failindex ABC
    }
} -body {
    # Compile and execute
    encoding_test
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"} -cleanup {
    rename encoding_test ""
}
test cmdAH-4.18.3 {Syntax error, -failindex with no var, no encoding} -body {
    encoding convertto -failindex ABC
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertto ?-nocomplain? ?-failindex var? ?encoding? data"}
test cmdAH-4.18.4 {Syntax error, -failindex with no var, no encoding (byte compiled)} -setup {
    proc encoding_test {} {
        encoding convertto -failindex ABC
    }
} -body {
    # Compile and execute
    encoding_test
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertto ?-nocomplain? ?-failindex var? ?encoding? data"} -cleanup {
    rename encoding_test ""
}
test cmdAH-4.19.1 {convertrom -failindex with correct data} -body {
    encoding convertfrom -failindex test ABC
    set test
} -returnCodes 0 -result -1
test cmdAH-4.19.2 {convertrom -failindex with correct data (byt compiled)} -setup {
    proc encoding_test {} {
	encoding convertfrom -failindex test ABC
	set test
    }
} -body {
    # Compile and execute
    encoding_test
} -returnCodes 0 -result -1 -cleanup {
    rename encoding_test ""
}
test cmdAH-4.19.3 {convertrom -failindex with correct data} -body {
    encoding convertto -failindex test ABC
    set test
} -returnCodes 0 -result -1
test cmdAH-4.19.4 {convertrom -failindex with correct data (byt compiled)} -setup {
    proc encoding_test {} {
	encoding convertto -failindex test ABC
	set test
    }
} -body {
    # Compile and execute
    encoding_test
} -returnCodes 0 -result -1 -cleanup {
    rename encoding_test ""
}
test cmdAH-4.20.1 {convertrom -failindex with incomplete utf8} -body {
    set x [encoding convertfrom -failindex i utf-8 A\xc3]
    binary scan $x H* y
    list $y $i
} -returnCodes 0 -result {41c3 -1}
test cmdAH-4.20.2 {convertrom -failindex with incomplete utf8 (byte compiled)} -setup {
    proc encoding_test {} {
	set x [encoding convertfrom -failindex i utf-8 A\xc3]
	binary scan $x H* y
	list $y $i
    }
} -body {
    # Compile and execute
    encoding_test
} -returnCodes 0 -result {41c3 -1} -cleanup {
    rename encoding_test ""
}
test cmdAH-4.21.1 {convertto -failindex with wrong character} -body {
    set x [encoding convertto -failindex i iso8859-1 A\u0141]
    binary scan $x H* y
    list $y $i
} -returnCodes 0 -result {41 1}
test cmdAH-4.21.2 {convertto -failindex with wrong character (byte compiled)} -setup {
    proc encoding_test {} {
    set x [encoding convertto -failindex i iso8859-1 A\u0141]
	binary scan $x H* y
	list $y $i
    }
} -body {
    # Compile and execute
    encoding_test
} -returnCodes 0 -result {41 1} -cleanup {
    rename encoding_test ""
}

test cmdAH-5.1 {Tcl_FileObjCmd} -returnCodes error -body {
    file
} -result {wrong # args: should be "file subcommand ?arg ...?"}
test cmdAH-5.2 {Tcl_FileObjCmd} -returnCodes error -body {
    file x
} -result {unknown or ambiguous subcommand "x": must be atime, attributes, channels, copy, delete, dirname, executable, exists, extension, isdirectory, isfile, join, link, lstat, mkdir, mtime, nativename, normalize, owned, pathtype, readable, readlink, rename, rootname, separator, size, split, stat, system, tail, tempdir, tempfile, type, volumes, or writable}

Changes to tests/encoding.test.

665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
    string length [encoding convertfrom -nocomplain "\x20"]
} 1
test encoding-24.21 {Parse with -nocomplain but without providing encoding} {
    string length [encoding convertto -nocomplain "\x20"]
} 1
test encoding-24.22 {Syntax error, two encodings} -body {
    encoding convertfrom iso8859-1 utf-8 "ZX\uD800"
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertfrom ?-nocomplain? ?encoding? data"}
test encoding-24.23 {Syntax error, two encodings} -body {
    encoding convertto iso8859-1 utf-8 "ZX\uD800"
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertto ?-nocomplain? ?encoding? data"}

file delete [file join [temporaryDirectory] iso2022.txt]

#
# Begin jajp encoding round-trip conformity tests
#
proc foreach-jisx0208 {varName command} {







|


|







665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
    string length [encoding convertfrom -nocomplain "\x20"]
} 1
test encoding-24.21 {Parse with -nocomplain but without providing encoding} {
    string length [encoding convertto -nocomplain "\x20"]
} 1
test encoding-24.22 {Syntax error, two encodings} -body {
    encoding convertfrom iso8859-1 utf-8 "ZX\uD800"
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"}
test encoding-24.23 {Syntax error, two encodings} -body {
    encoding convertto iso8859-1 utf-8 "ZX\uD800"
} -returnCodes 1 -result {wrong # args: should be "::tcl::encoding::convertto ?-nocomplain? ?-failindex var? ?encoding? data"}

file delete [file join [temporaryDirectory] iso2022.txt]

#
# Begin jajp encoding round-trip conformity tests
#
proc foreach-jisx0208 {varName command} {

Changes to tests/safe.test.

1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
} -result foobar
test safe-11.7 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertfrom
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?encoding? data"}
test safe-11.7.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertfrom} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?encoding? data"
    while executing
"encoding convertfrom"
    invoked from within
"encoding convertfrom"
    invoked from within
"interp eval $i encoding convertfrom"}
test safe-11.8 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertto
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?encoding? data"}
test safe-11.8.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertto} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?encoding? data"
    while executing
"encoding convertto"
    invoked from within
"encoding convertto"
    invoked from within
"interp eval $i encoding convertto"}








|








|












|








|







1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
} -result foobar
test safe-11.7 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertfrom
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"}
test safe-11.7.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertfrom} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertfrom ?-nocomplain? ?-failindex var? ?encoding? data"
    while executing
"encoding convertfrom"
    invoked from within
"encoding convertfrom"
    invoked from within
"interp eval $i encoding convertfrom"}
test safe-11.8 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    interp eval $i encoding convertto
} -returnCodes error -cleanup {
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?-failindex var? ?encoding? data"}
test safe-11.8.1 {testing safe encoding} -setup {
    set i [safe::interpCreate]
} -body {
    catch {interp eval $i encoding convertto} m o
    dict get $o -errorinfo
} -returnCodes ok -match glob -cleanup {
    unset -nocomplain m o
    safe::interpDelete $i
} -result {wrong # args: should be "encoding convertto ?-nocomplain? ?-failindex var? ?encoding? data"
    while executing
"encoding convertto"
    invoked from within
"encoding convertto"
    invoked from within
"interp eval $i encoding convertto"}