Tcl Source Code

Changes On Branch tip-597
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Changes In Branch tip-597 Excluding Merge-Ins

This is equivalent to a diff from b56d0f7829 to fa3a2fdfd8

2021-04-30
07:20
TIP #597: "string is unicode" and better utf-8/utf-16/cesu-8 encodings check-in: f239465fd9 user: jan.nijtmans tags: core-8-branch
2021-04-29
12:30
unofficial -> snapshot check-in: 7630e5e006 user: jan.nijtmans tags: core-8-branch
10:04
Merge 8.7 check-in: 39ffe9b5cd user: jan.nijtmans tags: build-info
10:01
merge-mark check-in: 12b8ae27f2 user: jan.nijtmans tags: trunk, main
09:59
merge-mark check-in: 1b2fc4bdae user: jan.nijtmans tags: encodings-with-flags
09:58
Merge 8.7 Closed-Leaf check-in: fa3a2fdfd8 user: jan.nijtmans tags: tip-597
09:58
Prevent compiler warning check-in: b56d0f7829 user: jan.nijtmans tags: core-8-branch
09:54
Merge 8.7 check-in: 8c23b1dc16 user: jan.nijtmans tags: tip-597
09:51
Unbreak build with -DTCL_NO_DEPRECATED check-in: 3d21594e7a user: jan.nijtmans tags: core-8-branch

Changes to doc/FileSystem.3.

1741
1742
1743
1744
1745
1746
1747
1748

1749
1750
1751
1752
1753
1754
1755
1741
1742
1743
1744
1745
1746
1747

1748
1749
1750
1751
1752
1753
1754
1755







-
+







appended to (i.e., that has a reference count no greater than 1). No reference
to it should be retained.
.TP
\fIlinkProc\fR
.
If \fItoPtr\fR is NULL, this should return a value with reference count 1 that
has just been allocated and passed to \fBTcl_IncrRefCount\fR. If \fItoPtr\fR
is not NULL, it should be returned on success. 
is not NULL, it should be returned on success.
.TP
\fIlistVolumesProc\fR
.
The result value should be a list (if non-NULL); it will have its reference
count decremented once (with \fBTcl_DecrRefCount\fR) by Tcl once done.
.TP
\fIfileAttrStringsProc\fR

Changes to doc/NRE.3.

241
242
243
244
245
246
247
248

249
250
251
252
253
254
255
241
242
243
244
245
246
247

248
249
250
251
252
253
254
255







-
+







The \fIresultObj\fR argument to \fBTcl_NRExprObj\fR should be an unshared
object.
.PP
Use \fBTcl_NRAddCallback\fR to schedule any required final decrementing of the
reference counts of arguments to any of the other functions on this page, as
with any other post-processing step in the non-recursive execution engine.
.PP
The 
The
.SH "SEE ALSO"
Tcl_CreateCommand(3), Tcl_CreateObjCommand(3), Tcl_EvalObjEx(3), Tcl_GetCommandFromObj(3), Tcl_ExprObj(3)
.SH KEYWORDS
stackless, nonrecursive, execute, command, global, value, result, script
.SH COPYRIGHT
Copyright \(co 2008 Kevin B. Kenny.
Copyright \(co 2018 Nathan Coulter.

Changes to doc/UniCharIsAlpha.3.

1
2
3
4
5
6
7
8
9
10
11

12
13
14
15
16
17
18
1
2
3
4
5
6
7
8
9
10

11
12
13
14
15
16
17
18










-
+







'\"
'\" Copyright (c) 1997 Sun Microsystems, Inc.
'\"
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\"
.TH Tcl_UniCharIsAlpha 3 "8.1" Tcl "Tcl Library Procedures"
.so man.macros
.BS
.SH NAME
Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha, Tcl_UniCharIsControl, Tcl_UniCharIsDigit, Tcl_UniCharIsGraph, Tcl_UniCharIsLower, Tcl_UniCharIsPrint, Tcl_UniCharIsPunct, Tcl_UniCharIsSpace, Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar \- routines for classification of Tcl_UniChar characters
Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha, Tcl_UniCharIsControl, Tcl_UniCharIsDigit, Tcl_UniCharIsGraph, Tcl_UniCharIsLower, Tcl_UniCharIsPrint, Tcl_UniCharIsPunct, Tcl_UniCharIsSpace, Tcl_UniCharIsUpper, Tcl_UniCharIsUnicode, Tcl_UniCharIsWordChar \- routines for classification of Tcl_UniChar characters
.SH SYNOPSIS
.nf
\fB#include <tcl.h>\fR
.sp
int
\fBTcl_UniCharIsAlnum\fR(\fIch\fR)
.sp
39
40
41
42
43
44
45



46
47
48
49
50
51
52
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55







+
+
+







.sp
int
\fBTcl_UniCharIsSpace\fR(\fIch\fR)
.sp
int
\fBTcl_UniCharIsUpper\fR(\fIch\fR)
.sp
int
\fBTcl_UniCharIsUnicode\fR(\fIch\fR)
.sp
int
\fBTcl_UniCharIsWordChar\fR(\fIch\fR)
.SH ARGUMENTS
.AS int ch
.AP int ch in
The Unicode character to be examined.
.BE
76
77
78
79
80
81
82



83
84
85
86
87
88
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94







+
+
+






\fBTcl_UniCharIsPrint\fR tests if the character is a Unicode print character.
.PP
\fBTcl_UniCharIsPunct\fR tests if the character is a Unicode punctuation character.
.PP
\fBTcl_UniCharIsSpace\fR tests if the character is a whitespace Unicode character.
.PP
\fBTcl_UniCharIsUpper\fR tests if the character is an uppercase Unicode character.
.PP
\fBTcl_UniCharIsUnicode\fR tests if the character is a Unicode character, not being
a surrogate or noncharacter.
.PP
\fBTcl_UniCharIsWordChar\fR tests if the character is alphanumeric or
a connector punctuation mark.

.SH KEYWORDS
unicode, classification

Changes to doc/string.n.

409
410
411
412
413
414
415
416
417
418
419




420
421
422
423
424
425
426
409
410
411
412
413
414
415




416
417
418
419
420
421
422
423
424
425
426







-
-
-
-
+
+
+
+







Java's serialization mechanism) to enable basic processing with
non-Unicode-aware C functions.  As this representation should only
ever be used by Tcl's implementation, the number of bytes used to
store the representation is of very low value (except to C extension
code, which has direct access for the purpose of memory management,
etc.)
.PP
\fICompatibility note:\fR it is likely that this subcommand will be
withdrawn in a future version of Tcl. It is better to use the
\fBencoding convertto\fR command to convert a string to a known
encoding and then apply \fBstring length\fR to that.
\fICompatibility note:\fR This subcommand is deprecated and will
be removed in Tcl 9.0. It is better to use the \fBencoding convertto\fR
command to convert a string to a known encoding (e.g. "utf-8" or "cesu-8")
and then apply \fBstring length\fR to that.
.PP
.CS
\fBstring length\fR [encoding convertto utf-8 $theString]
.CE
.RE
.TP
\fBstring wordend \fIstring charIndex\fR

Changes to generic/tcl.decls.

2419
2420
2421
2422
2423
2424
2425



2426
2427
2428
2429
2430
2431
2432
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435







+
+
+







}
declare 655 {
    const char *Tcl_UtfNext(const char *src)
}
declare 656 {
    const char *Tcl_UtfPrev(const char *src, const char *start)
}
declare 657 {
    int Tcl_UniCharIsUnicode(int ch)
}

# ----- BASELINE -- FOR -- 8.7.0 ----- #

##############################################################################

# Define the platform specific public Tcl interface. These functions are only
# available on the designated platform.

Changes to generic/tclCmdMZ.c.

1529
1530
1531
1532
1533
1534
1535
1536
1537


1538
1539
1540
1541
1542
1543
1544
1545


1546
1547
1548
1549
1550
1551
1552
1529
1530
1531
1532
1533
1534
1535


1536
1537
1538
1539
1540
1541
1542
1543


1544
1545
1546
1547
1548
1549
1550
1551
1552







-
-
+
+






-
-
+
+







    Tcl_WideInt w;

    static const char *const isClasses[] = {
	"alnum",	"alpha",	"ascii",	"control",
	"boolean",	"dict",		"digit",	"double",
	"entier",	"false",	"graph",	"integer",
	"list",		"lower",	"print",	"punct",
	"space",	"true",		"upper",	"wideinteger",
	"wordchar",	"xdigit",	NULL
	"space",	"true",		"upper",	"unicode",
	"wideinteger", "wordchar",	"xdigit",	NULL
    };
    enum isClassesEnum {
	STR_IS_ALNUM,	STR_IS_ALPHA,	STR_IS_ASCII,	STR_IS_CONTROL,
	STR_IS_BOOL,	STR_IS_DICT,	STR_IS_DIGIT,	STR_IS_DOUBLE,
	STR_IS_ENTIER,	STR_IS_FALSE,	STR_IS_GRAPH,	STR_IS_INT,
	STR_IS_LIST,	STR_IS_LOWER,	STR_IS_PRINT,	STR_IS_PUNCT,
	STR_IS_SPACE,	STR_IS_TRUE,	STR_IS_UPPER,	STR_IS_WIDE,
	STR_IS_WORD,	STR_IS_XDIGIT
	STR_IS_SPACE,	STR_IS_TRUE,	STR_IS_UPPER,	STR_IS_UNICODE,
	STR_IS_WIDE,	STR_IS_WORD,	STR_IS_XDIGIT
    };
    static const char *const isOptions[] = {
	"-strict", "-failindex", NULL
    };
    enum isOptionsEnum {
	OPT_STRICT, OPT_FAILIDX
    };
1868
1869
1870
1871
1872
1873
1874



1875
1876
1877
1878
1879
1880
1881
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884







+
+
+







	break;
    case STR_IS_SPACE:
	chcomp = Tcl_UniCharIsSpace;
	break;
    case STR_IS_UPPER:
	chcomp = Tcl_UniCharIsUpper;
	break;
    case STR_IS_UNICODE:
	chcomp = Tcl_UniCharIsUnicode;
	break;
    case STR_IS_WORD:
	chcomp = Tcl_UniCharIsWordChar;
	break;
    case STR_IS_XDIGIT:
	chcomp = UniCharIsHexDigit;
	break;
    }
2828
2829
2830
2831
2832
2833
2834

2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852

2853
2854
2855
2856
2857
2858
2859
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864







+


















+







 *	A standard Tcl result.
 *
 * Side effects:
 *	See the user documentation.
 *
 *----------------------------------------------------------------------
 */
#if TCL_MAJOR_VERSION < 9 && !defined(TCL_NO_DEPRECATED)
static int
StringBytesCmd(
    TCL_UNUSED(ClientData),
    Tcl_Interp *interp,		/* Current interpreter. */
    int objc,			/* Number of arguments. */
    Tcl_Obj *const objv[])	/* Argument objects. */
{
    int length;

    if (objc != 2) {
	Tcl_WrongNumArgs(interp, 1, objv, "string");
	return TCL_ERROR;
    }

    (void) TclGetStringFromObj(objv[1], &length);
    Tcl_SetObjResult(interp, Tcl_NewWideIntObj(length));
    return TCL_OK;
}
#endif

/*
 *----------------------------------------------------------------------
 *
 * StringLenCmd --
 *
 *	This procedure is invoked to process the "string length" Tcl command.
3302
3303
3304
3305
3306
3307
3308

3309

3310
3311
3312
3313
3314
3315
3316
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323







+

+







 */

Tcl_Command
TclInitStringCmd(
    Tcl_Interp *interp)		/* Current interpreter. */
{
    static const EnsembleImplMap stringImplMap[] = {
#if TCL_MAJOR_VERSION < 9 && !defined(TCL_NO_DEPRECATED)
	{"bytelength",	StringBytesCmd,	TclCompileBasic1ArgCmd, NULL, NULL, 0},
#endif
	{"cat",		StringCatCmd,	TclCompileStringCatCmd, NULL, NULL, 0},
	{"compare",	StringCmpCmd,	TclCompileStringCmpCmd, NULL, NULL, 0},
	{"equal",	StringEqualCmd,	TclCompileStringEqualCmd, NULL, NULL, 0},
	{"first",	StringFirstCmd,	TclCompileStringFirstCmd, NULL, NULL, 0},
	{"index",	StringIndexCmd,	TclCompileStringIndexCmd, NULL, NULL, 0},
	{"insert",	StringInsertCmd, TclCompileStringInsertCmd, NULL, NULL, 0},
	{"is",		StringIsCmd,	TclCompileStringIsCmd, NULL, NULL, 0},

Changes to generic/tclCompCmdsSZ.c.

501
502
503
504
505
506
507
508
509
510
511
512





513
514
515
516
517
518
519
520





521
522
523
524
525
526
527
501
502
503
504
505
506
507





508
509
510
511
512
513
514
515





516
517
518
519
520
521
522
523
524
525
526
527







-
-
-
-
-
+
+
+
+
+



-
-
-
-
-
+
+
+
+
+







				 * compiled. */
    CompileEnv *envPtr)		/* Holds resulting instructions. */
{
    DefineLineInformation;	/* TIP #280 */
    Tcl_Token *tokenPtr = TokenAfter(parsePtr->tokenPtr);
    static const char *const isClasses[] = {
	"alnum",	"alpha",	"ascii",	"control",
	"boolean",	"dict", "digit",	"double",	"entier",
	"false",	"graph",	"integer",	"list",
	"lower",	"print",	"punct",	"space",
	"true",		"upper",	"wideinteger",	"wordchar",
	"xdigit",	NULL
	"boolean",	"dict",		"digit",	"double",
	"entier",	"false",	"graph",	"integer",
	"list",		"lower",	"print",	"punct",
	"space",	"true",		"upper",	"unicode",
	"wideinteger", "wordchar",	"xdigit",	NULL
    };
    enum isClassesEnum {
	STR_IS_ALNUM,	STR_IS_ALPHA,	STR_IS_ASCII,	STR_IS_CONTROL,
	STR_IS_BOOL,	STR_IS_DICT, STR_IS_DIGIT,	STR_IS_DOUBLE,	STR_IS_ENTIER,
	STR_IS_FALSE,	STR_IS_GRAPH,	STR_IS_INT,	STR_IS_LIST,
	STR_IS_LOWER,	STR_IS_PRINT,	STR_IS_PUNCT,	STR_IS_SPACE,
	STR_IS_TRUE,	STR_IS_UPPER,	STR_IS_WIDE,	STR_IS_WORD,
	STR_IS_XDIGIT
	STR_IS_BOOL,	STR_IS_DICT,	STR_IS_DIGIT,	STR_IS_DOUBLE,
	STR_IS_ENTIER,	STR_IS_FALSE,	STR_IS_GRAPH,	STR_IS_INT,
	STR_IS_LIST,	STR_IS_LOWER,	STR_IS_PRINT,	STR_IS_PUNCT,
	STR_IS_SPACE,	STR_IS_TRUE,	STR_IS_UPPER,	STR_IS_UNICODE,
	STR_IS_WIDE,	STR_IS_WORD,	STR_IS_XDIGIT
    };
    int t, range, allowEmpty = 0, end;
    InstStringClassType strClassType;
    Tcl_Obj *isClass;

    if (parsePtr->numWords < 3 || parsePtr->numWords > 6) {
	return TCL_ERROR;
605
606
607
608
609
610
611



612
613
614
615
616
617
618
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621







+
+
+







	goto compileStrClass;
    case STR_IS_SPACE:
	strClassType = STR_CLASS_SPACE;
	goto compileStrClass;
    case STR_IS_UPPER:
	strClassType = STR_CLASS_UPPER;
	goto compileStrClass;
    case STR_IS_UNICODE:
	strClassType = STR_CLASS_UNICODE;
	goto compileStrClass;
    case STR_IS_WORD:
	strClassType = STR_CLASS_WORD;
	goto compileStrClass;
    case STR_IS_XDIGIT:
	strClassType = STR_CLASS_XDIGIT;
    compileStrClass:
	if (allowEmpty) {
1411
1412
1413
1414
1415
1416
1417

1418
1419
1420
1421
1422
1423
1424
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428







+







    {"lower",	Tcl_UniCharIsLower},
    {"print",	Tcl_UniCharIsPrint},
    {"punct",	Tcl_UniCharIsPunct},
    {"space",	Tcl_UniCharIsSpace},
    {"upper",	Tcl_UniCharIsUpper},
    {"word",	Tcl_UniCharIsWordChar},
    {"xdigit",	UniCharIsHexDigit},
    {"unicode",	Tcl_UniCharIsUnicode},
    {"",	NULL}
};

/*
 *----------------------------------------------------------------------
 *
 * TclCompileSubstCmd --

Changes to generic/tclCompile.h.

918
919
920
921
922
923
924
925

926

927
928
929
930
931
932
933
918
919
920
921
922
923
924

925
926
927
928
929
930
931
932
933
934







-
+

+







    STR_CLASS_PRINT,		/* Unicode printing characters, including
				 * spaces. */
    STR_CLASS_PUNCT,		/* Unicode punctuation characters. */
    STR_CLASS_SPACE,		/* Unicode space characters. */
    STR_CLASS_UPPER,		/* Unicode upper-case alphabet characters. */
    STR_CLASS_WORD,		/* Unicode word (alphabetic, digit, connector
				 * punctuation) characters. */
    STR_CLASS_XDIGIT		/* Characters that can be used as digits in
    STR_CLASS_XDIGIT,		/* Characters that can be used as digits in
				 * hexadecimal numbers ([0-9A-Fa-f]). */
    STR_CLASS_UNICODE		/* Unicode characters. */
} InstStringClassType;

typedef struct StringClassDesc {
    char name[8];		/* Name of the class. */
    int (*comparator)(int);	/* Function to test if a single unicode
				 * character is a member of the class. */
} StringClassDesc;

Changes to generic/tclDecls.h.

1933
1934
1935
1936
1937
1938
1939


1940
1941
1942
1943
1944
1945
1946
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948







+
+







				size_t *lengthPtr);
/* 654 */
EXTERN int		Tcl_UtfCharComplete(const char *src, int length);
/* 655 */
EXTERN const char *	Tcl_UtfNext(const char *src);
/* 656 */
EXTERN const char *	Tcl_UtfPrev(const char *src, const char *start);
/* 657 */
EXTERN int		Tcl_UniCharIsUnicode(int ch);

typedef struct {
    const struct TclPlatStubs *tclPlatStubs;
    const struct TclIntStubs *tclIntStubs;
    const struct TclIntPlatStubs *tclIntPlatStubs;
} TclStubHooks;

2625
2626
2627
2628
2629
2630
2631

2632
2633
2634
2635
2636
2637
2638
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641







+







    void (*reserved650)(void);
    char * (*tclGetStringFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 651 */
    Tcl_UniChar * (*tclGetUnicodeFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 652 */
    unsigned char * (*tclGetByteArrayFromObj) (Tcl_Obj *objPtr, size_t *lengthPtr); /* 653 */
    int (*tcl_UtfCharComplete) (const char *src, int length); /* 654 */
    const char * (*tcl_UtfNext) (const char *src); /* 655 */
    const char * (*tcl_UtfPrev) (const char *src, const char *start); /* 656 */
    int (*tcl_UniCharIsUnicode) (int ch); /* 657 */
} TclStubs;

extern const TclStubs *tclStubsPtr;

#ifdef __cplusplus
}
#endif
3967
3968
3969
3970
3971
3972
3973


3974
3975
3976
3977
3978
3979
3980
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985







+
+







	(tclStubsPtr->tclGetByteArrayFromObj) /* 653 */
#define Tcl_UtfCharComplete \
	(tclStubsPtr->tcl_UtfCharComplete) /* 654 */
#define Tcl_UtfNext \
	(tclStubsPtr->tcl_UtfNext) /* 655 */
#define Tcl_UtfPrev \
	(tclStubsPtr->tcl_UtfPrev) /* 656 */
#define Tcl_UniCharIsUnicode \
	(tclStubsPtr->tcl_UniCharIsUnicode) /* 657 */

#endif /* defined(USE_TCL_STUBS) */

/* !END!: Do not edit above this line. */

#undef TclUnusedStubEntry
#if defined(USE_TCL_STUBS)

Changes to generic/tclEncoding.c.

506
507
508
509
510
511
512
513

514
515
516



517

518
519
520
521
522
523
524
506
507
508
509
510
511
512

513



514
515
516
517
518
519
520
521
522
523
524
525







-
+
-
-
-
+
+
+

+







 *
 * Side effects:
 *	Depends on the memory, object, and IO subsystems.
 *
 *---------------------------------------------------------------------------
 */

/* This flags must not conflict with other TCL_ENCODING_* flags in tcl.h */
/* Those flags must not conflict with other TCL_ENCODING_* flags in tcl.h */
#define TCL_ENCODING_MODIFIED	0x20	/* Converting NULL bytes to 0xC0 0x80 */
/* Since TCL_ENCODING_MODIFIED is only used for utf-8 and
 * TCL_ENCODING_LE is only used for  utf-16/ucs-2, re-use the same value */
/* Since TCL_ENCODING_MODIFIED is only used for utf-8/cesu-8 and
 * TCL_ENCODING_LE is only used for  utf-16/ucs-2. re-use the same value */
#define TCL_ENCODING_MODIFIED	0x20	/* Converting NULL bytes to 0xC0 0x80 */
#define TCL_ENCODING_LE		TCL_ENCODING_MODIFIED	/* Little-endian encoding */
#define TCL_ENCODING_UTF	0x200	/* For UTF-8 encoding, allow 4-byte output sequences */

void
TclInitEncodingSubsystem(void)
{
    Tcl_EncodingType type;
    TableEncodingData *dataPtr;
    unsigned size;
552
553
554
555
556
557
558


559


560
561
562
563
564
565
566
553
554
555
556
557
558
559
560
561

562
563
564
565
566
567
568
569
570







+
+
-
+
+







    tclIdentityEncoding = Tcl_CreateEncoding(&type);

    type.encodingName	= "utf-8";
    type.toUtfProc	= UtfToUtfProc;
    type.fromUtfProc	= UtfToUtfProc;
    type.freeProc	= NULL;
    type.nullSize	= 1;
    type.clientData	= INT2PTR(TCL_ENCODING_UTF);
    Tcl_CreateEncoding(&type);
    type.clientData	= NULL;
    type.clientData	= INT2PTR(0);
    type.encodingName	= "cesu-8";
    Tcl_CreateEncoding(&type);

    type.toUtfProc	= Utf16ToUtfProc;
    type.fromUtfProc    = UtfToUcs2Proc;
    type.freeProc	= NULL;
    type.nullSize	= 2;
    type.encodingName   = "ucs-2le";
1137
1138
1139
1140
1141
1142
1143
1144

1145
1146
1147
1148
1149
1150
1151
1141
1142
1143
1144
1145
1146
1147

1148
1149
1150
1151
1152
1153
1154
1155







-
+







	srcLen = 0;
    } else if (srcLen < 0) {
	srcLen = encodingPtr->lengthProc(src);
    }

    flags = TCL_ENCODING_START | TCL_ENCODING_END;
    if (encodingPtr->toUtfProc == UtfToUtfProc) {
	flags |= TCL_ENCODING_MODIFIED;
	flags |= TCL_ENCODING_MODIFIED | TCL_ENCODING_UTF;
    }

    while (1) {
	result = encodingPtr->toUtfProc(encodingPtr->clientData, src, srcLen,
		flags, &state, dst, dstLen, &srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

1254
1255
1256
1257
1258
1259
1260
1261

1262
1263
1264
1265
1266
1267
1268
1258
1259
1260
1261
1262
1263
1264

1265
1266
1267
1268
1269
1270
1271
1272







-
+







	 * the actual \0 at the end of the destination buffer, we need to
	 * append it manually.  First make room for it...
	 */

	dstLen--;
    }
    if (encodingPtr->toUtfProc == UtfToUtfProc) {
	flags |= TCL_ENCODING_MODIFIED;
	flags |= TCL_ENCODING_MODIFIED | TCL_ENCODING_UTF;
    }
    do {
	Tcl_EncodingState savedState = *statePtr;

	result = encodingPtr->toUtfProc(encodingPtr->clientData, src, srcLen,
		flags, statePtr, dst, dstLen, srcReadPtr, dstWrotePtr,
		dstCharsPtr);
1333
1334
1335
1336
1337
1338
1339

1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353

1354
1355
1356
1357
1358
1359
1360







+









-







    flags = TCL_ENCODING_START | TCL_ENCODING_END;
    while (1) {
	result = encodingPtr->fromUtfProc(encodingPtr->clientData, src,
		srcLen, flags, &state, dst, dstLen,
		&srcRead, &dstWrote, &dstChars);
	soFar = dst + dstWrote - Tcl_DStringValue(dstPtr);

	src += srcRead;
	if (result != TCL_CONVERT_NOSPACE) {
	    if (encodingPtr->nullSize == 2) {
		Tcl_DStringSetLength(dstPtr, soFar + 1);
	    }
	    Tcl_DStringSetLength(dstPtr, soFar);
	    return Tcl_DStringValue(dstPtr);
	}

	flags &= ~TCL_ENCODING_START;
	src += srcRead;
	srcLen -= srcRead;
	if (Tcl_DStringLength(dstPtr) == 0) {
	    Tcl_DStringSetLength(dstPtr, dstLen);
	}
	Tcl_DStringSetLength(dstPtr, 2 * Tcl_DStringLength(dstPtr) + 1);
	dst = Tcl_DStringValue(dstPtr) + soFar;
	dstLen = Tcl_DStringLength(dstPtr) - soFar - 1;
2211
2212
2213
2214
2215
2216
2217
2218

2219
2220
2221
2222
2223
2224
2225
2215
2216
2217
2218
2219
2220
2221

2222
2223
2224
2225
2226
2227
2228
2229







-
+







    }
    if (flags & TCL_ENCODING_CHAR_LIMIT) {
	charLimit = *dstCharsPtr;
    }

    dstStart = dst;
    flags |= PTR2INT(clientData);
    dstEnd = dst + dstLen - TCL_UTF_MAX;
    dstEnd = dst + dstLen - ((flags & TCL_ENCODING_UTF) ? TCL_UTF_MAX : 6);

    for (numChars = 0; src < srcEnd && numChars <= charLimit; numChars++) {
	if ((src > srcClose) && (!Tcl_UtfCharComplete(src, srcEnd - src))) {
	    /*
	     * If there is more string to follow, this will ensure that the
	     * last UTF-8 character in the source buffer hasn't been cut off.
	     */
2264
2265
2266
2267
2268
2269
2270

2271
2272
2273
2274
2275
2276
2277










2278

2279
2280
2281
2282
2283
2284
2285
2286









2287
2288
2289
2290
2291
2292
2293
2294









2295
2296
2297
2298
2299
2300
2301
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292

2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334







+







+
+
+
+
+
+
+
+
+
+
-
+








+
+
+
+
+
+
+
+
+








+
+
+
+
+
+
+
+
+







		char chbuf[2];
		chbuf[0] = UCHAR(*src++); chbuf[1] = 0;
		TclUtfToUCS4(chbuf, &ch);
	    }
	    dst += Tcl_UniCharToUtf(ch, dst);
	} else {
	    int low;
	    const char *saveSrc = src;
	    size_t len = TclUtfToUCS4(src, &ch);
	    if ((len < 2) && (ch != 0) && (flags & TCL_ENCODING_STOPONERROR)
		    && (flags & TCL_ENCODING_MODIFIED)) {
		result = TCL_CONVERT_SYNTAX;
		break;
	    }
	    src += len;
	    if (!(flags & TCL_ENCODING_UTF)) {
		if (ch > 0xFFFF) {
		    /* CESU-8 6-byte sequence for chars > U+FFFF */
		    ch -= 0x10000;
		    *dst++ = 0xED;
		    *dst++ = (char) (((ch >> 16) & 0x0F) | 0xA0);
		    *dst++ = (char) (((ch >> 10) & 0x3F) | 0x80);
		    ch = (ch & 0x0CFF) | 0xDC00;
		}
		goto cesu8;
	    if ((ch | 0x7FF) == 0xDFFF) {
	    } else if ((ch | 0x7FF) == 0xDFFF) {
		/*
		 * A surrogate character is detected, handle especially.
		 */

		low = ch;
		len = (src <= srcEnd-3) ? TclUtfToUCS4(src, &low) : 0;

		if (((low & ~0x3FF) != 0xDC00) || (ch & 0x400)) {
		    if (flags & TCL_ENCODING_STOPONERROR) {
			result = TCL_CONVERT_UNKNOWN;
			src = saveSrc;
			break;
		    }
		    if (!(flags & TCL_ENCODING_MODIFIED)) {
			ch = 0xFFFD;
		    }
		cesu8:
		    *dst++ = (char) (((ch >> 12) | 0xE0) & 0xEF);
		    *dst++ = (char) (((ch >> 6) | 0x80) & 0xBF);
		    *dst++ = (char) ((ch | 0x80) & 0xBF);
		    continue;
		}
		src += len;
		dst += Tcl_UniCharToUtf(ch, dst);
		ch = low;
	    } else if (!Tcl_UniCharIsUnicode(ch)) {
		if (flags & TCL_ENCODING_STOPONERROR) {
		    result = TCL_CONVERT_UNKNOWN;
		    src = saveSrc;
		    break;
		}
		if (!(flags & TCL_ENCODING_MODIFIED)) {
		    ch = 0xFFFD;
		}
	    }
	    dst += Tcl_UniCharToUtf(ch, dst);
	}
    }

    *srcReadPtr = src - srcStart;
    *dstWrotePtr = dst - dstStart;
2446
2447
2448
2449
2450
2451
2452
2453

2454
2455
2456
2457
2458
2459
2460
2479
2480
2481
2482
2483
2484
2485

2486
2487
2488
2489
2490
2491
2492
2493







-
+







				 * the conversion. */
    int *dstCharsPtr)		/* Filled with the number of characters that
				 * correspond to the bytes stored in the
				 * output buffer. */
{
    const char *srcStart, *srcEnd, *srcClose, *dstStart, *dstEnd;
    int result, numChars;
    int ch;
    int ch, len;

    srcStart = src;
    srcEnd = src + srcLen;
    srcClose = srcEnd;
    if ((flags & TCL_ENCODING_END) == 0) {
	srcClose -= TCL_UTF_MAX;
    }
2474
2475
2476
2477
2478
2479
2480
2481









2482
2483
2484
2485
2486
2487
2488
2507
2508
2509
2510
2511
2512
2513

2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529







-
+
+
+
+
+
+
+
+
+







	    result = TCL_CONVERT_MULTIBYTE;
	    break;
	}
	if (dst > dstEnd) {
	    result = TCL_CONVERT_NOSPACE;
	    break;
	}
	src += TclUtfToUCS4(src, &ch);
	len = TclUtfToUCS4(src, &ch);
	if (!Tcl_UniCharIsUnicode(ch)) {
	    if (flags & TCL_ENCODING_STOPONERROR) {
		result = TCL_CONVERT_UNKNOWN;
		break;
	    }
	    ch = 0xFFFD;
	}
	src += len;
	if (flags & TCL_ENCODING_LE) {
	    if (ch <= 0xFFFF) {
		*dst++ = (ch & 0xFF);
		*dst++ = (ch >> 8);
	    } else {
		*dst++ = (((ch - 0x10000) >> 10) & 0xFF);
		*dst++ = (((ch - 0x10000) >> 18) & 0x3) | 0xD8;

Changes to generic/tclStubInit.c.

1937
1938
1939
1940
1941
1942
1943

1944
1945
1946
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947







+



    0, /* 650 */
    TclGetStringFromObj, /* 651 */
    TclGetUnicodeFromObj, /* 652 */
    TclGetByteArrayFromObj, /* 653 */
    Tcl_UtfCharComplete, /* 654 */
    Tcl_UtfNext, /* 655 */
    Tcl_UtfPrev, /* 656 */
    Tcl_UniCharIsUnicode, /* 657 */
};

/* !END!: Do not edit above this line. */

Changes to generic/tclUtf.c.

2175
2176
2177
2178
2179
2180
2181






























2182
2183
2184
2185
2186
2187
2188
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218







+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+







{
    if (UNICODE_OUT_OF_RANGE(ch)) {
	return 0;
    }
    return (GetCategory(ch) == UPPERCASE_LETTER);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsUnicode --
 *
 *	Test if a character is a Unicode character.
 *
 * Results:
 *	Returns non-zero if character belongs to the Unicode set.
 *
 *	Excluded are:
 *	  1) All characters > U+10FFFF
 *	  2) Surrogates U+D800 - U+DFFF
 *	  3) Last 2 characters of each plane, so U+??FFFE  and U+??FFFF
 *	  4) The characters in the range U+FDD0 - U+FDEF
 *
 * Side effects:
 *	None.
 *
 *----------------------------------------------------------------------
 */

int
Tcl_UniCharIsUnicode(
    int ch)			/* Unicode character to test. */
{
    return ((unsigned int)ch <= 0x10FFFF) && ((ch & 0xFFF800) != 0xD800)
	    && ((ch & 0xFFFE) != 0xFFFE) && ((unsigned int)(ch - 0xFDD0) >= 32);
}

/*
 *----------------------------------------------------------------------
 *
 * Tcl_UniCharIsWordChar --
 *
 *	Test if a character is alphanumeric or a connector punctuation mark.
 *

Changes to library/init.tcl.

210
211
212
213
214
215
216
217

218
219

220
221
222
223
224
225
226
210
211
212
213
214
215
216

217
218

219
220
221
222
223
224
225
226







-
+

-
+







		# Compute stack trace contribution from the [uplevel].
		# Note the dependence on how Tcl_AddErrorInfo, etc.
		# construct the stack trace.
		#
		set errInfo [dict get $opts -errorinfo]
		set errCode [dict get $opts -errorcode]
		set cinfo $args
		if {[string bytelength $cinfo] > 150} {
		if {[string length [encoding convertto utf-8 $cinfo]] > 150} {
		    set cinfo [string range $cinfo 0 150]
		    while {[string bytelength $cinfo] > 150} {
		    while {[string length [encoding convertto utf-8 $cinfo]] > 150} {
			set cinfo [string range $cinfo 0 end-1]
		    }
		    append cinfo ...
		}
		set tail "\n    (\"uplevel\" body line 1)\n    invoked\
			from within\n\"uplevel 1 \$args\""
		set expect "$msg\n    while executing\n\"$cinfo\"$tail"

Changes to tests/encoding.test.

339
340
341
342
343
344
345
346

347
348
349
350
351
352

353
354
355
356
357
358

359
360
361
362
363
364

365
366
367
368
369
370

371
372
373
374
375
376

377
378
379
380
381
382

383
384
385
386
387
388

389
390
391
392
393
394

395
396
397
398
399
400

401
402
403
404
405
406
407
408
409
410
411




















412
413
414
415
416
417
418
339
340
341
342
343
344
345

346
347
348
349
350
351

352
353
354
355
356
357

358
359
360
361
362
363

364
365
366
367
368
369

370
371
372
373
374
375

376
377
378
379
380
381

382
383
384
385
386
387

388
389
390
391
392
393

394
395
396
397
398
399

400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438







-
+





-
+





-
+





-
+





-
+





-
+





-
+





-
+





-
+





-
+











+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+







    list [string length $x] $y
} "4 😂"
test encoding-15.6 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uDE02\uD83D
    set y [encoding convertto utf-8 \uDE02\uD83D\uDE02\uD83D]
    binary scan $y H* z
    list [string length $y] $z
} {10 edb882f09f9882eda0bd}
} {10 efbfbdf09f9882efbfbd}
test encoding-15.7 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83D\uD83D
    set y [encoding convertto utf-8 \uDE02\uD83D\uD83D]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 9 edb882eda0bdeda0bd}
} {3 9 efbfbdefbfbdefbfbd}
test encoding-15.8 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83Dé
    set y [encoding convertto utf-8 \uDE02\uD83Dé]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 8 edb882eda0bdc3a9}
} {3 8 efbfbdefbfbdc3a9}
test encoding-15.9 {UtfToUtfProc emoji character output} {
    set x \uDE02\uD83DX
    set y [encoding convertto utf-8 \uDE02\uD83DX]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {3 7 edb882eda0bd58}
} {3 7 efbfbdefbfbd58}
test encoding-15.10 {UtfToUtfProc high surrogate character output} {
    set x \uDE02é
    set y [encoding convertto utf-8 \uDE02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 edb882c3a9}
} {2 5 efbfbdc3a9}
test encoding-15.11 {UtfToUtfProc low surrogate character output} {
    set x \uDA02é
    set y [encoding convertto utf-8 \uDA02é]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 5 eda882c3a9}
} {2 5 efbfbdc3a9}
test encoding-15.12 {UtfToUtfProc high surrogate character output} {
    set x \uDE02Y
    set y [encoding convertto utf-8 \uDE02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 edb88259}
} {2 4 efbfbd59}
test encoding-15.13 {UtfToUtfProc low surrogate character output} {
    set x \uDA02Y
    set y [encoding convertto utf-8 \uDA02Y]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {2 4 eda88259}
} {2 4 efbfbd59}
test encoding-15.14 {UtfToUtfProc high surrogate character output} {
    set x \uDE02
    set y [encoding convertto utf-8 \uDE02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 edb882}
} {1 3 efbfbd}
test encoding-15.15 {UtfToUtfProc low surrogate character output} {
    set x \uDA02
    set y [encoding convertto utf-8 \uDA02]
    binary scan $y H* z
    list [string length $x] [string length $y] $z
} {1 3 eda882}
} {1 3 efbfbd}
test encoding-15.16 {UtfToUtfProc: Invalid 4-byte UTF-8, see [ed29806ba]} {
    set x \xF0\xA0\xA1\xC2
    set y [encoding convertfrom utf-8 \xF0\xA0\xA1\xC2]
    list [string length $x] $y
} "4 \xF0\xA0\xA1\xC2"
test encoding-15.17 {UtfToUtfProc emoji character output} {
    set x 😂
    set y [encoding convertto utf-8 😂]
    binary scan $y H* z
    list [string length $y] $z
} {4 f09f9882}
test encoding-15.18 {UtfToUtfProc CESU-8 6-byte sequence} {
    set y [encoding convertto cesu-8 \U10000]
    binary scan $y H* z
    list [string length $y] $z
} {6 eda080edb080}
test encoding-15.19 {UtfToUtfProc CESU-8 upper surrogate} {
    set y [encoding convertto cesu-8 \uD800]
    binary scan $y H* z
    list [string length $y] $z
} {3 eda080}
test encoding-15.20 {UtfToUtfProc CESU-8 lower surrogate} {
    set y [encoding convertto cesu-8 \uDC00]
    binary scan $y H* z
    list [string length $y] $z
} {3 edb080}
test encoding-15.21 {UtfToUtfProc CESU-8 noncharacter} {
    set y [encoding convertto cesu-8 \uFFFF]
    binary scan $y H* z
    list [string length $y] $z
} {3 efbfbf}

test encoding-16.1 {Utf16ToUtfProc} -body {
    set val [encoding convertfrom utf-16 NN]
    list $val [format %x [scan $val %c]]
} -result "乎 4e4e"
test encoding-16.2 {Utf16ToUtfProc} -body {
    set val [encoding convertfrom utf-16 "\xD8\xD8\xDC\xDC"]
430
431
432
433
434
435
436
437
438
439



440
441
442
443
444
445





446
447
448
449
450
451
452
450
451
452
453
454
455
456



457
458
459
460





461
462
463
464
465
466
467
468
469
470
471
472







-
-
-
+
+
+

-
-
-
-
-
+
+
+
+
+







    set val [encoding convertfrom ucs-2 "\xD8\xD8\xDC\xDC"]
    list $val [format %x [scan $val %c]]
} -result "\U460DC 460dc"

test encoding-17.1 {UtfToUtf16Proc} -body {
    encoding convertto utf-16 "\U460DC"
} -result "\xD8\xD8\xDC\xDC"
test encoding-17.2 {UtfToUtf16Proc} -body {
    encoding convertto utf-16 "\uDCDC"
} -result "\xDC\xDC"
test encoding-17.2 {UtfToUcs2Proc} -body {
    encoding convertfrom utf-16 [encoding convertto ucs-2 "\U460DC"]
} -result "\uFFFD"
test encoding-17.3 {UtfToUtf16Proc} -body {
    encoding convertto utf-16 "\uD8D8"
} -result "\xD8\xD8"
test encoding-17.4 {UtfToUcs2Proc} -body {
    encoding convertfrom utf-16 [encoding convertto ucs-2 "\U460DC"]
} -result "\uFFFD"
    encoding convertto utf-16be "\uDCDC"
} -result "\xFF\xFD"
test encoding-17.4 {UtfToUtf16Proc} -body {
    encoding convertto utf-16le "\uD8D8"
} -result "\xFD\xFF"

test encoding-18.1 {TableToUtfProc} {
} {}

test encoding-19.1 {TableFromUtfProc} {
} {}

738
739
740
741
742
743
744
745

746
747
748
749
750
751
752
753
754
755
756
757
758
758
759
760
761
762
763
764

765
766
767
768
769
770
771
772
773
774
775
776
777
778







-
+













		encoding convertto $name $string

		# discard the cached internal representation of Tcl_Encoding
		# Unfortunately, without this, encoding 2-1 fails.
		llength $name
	}
	return $count
} -result [expr {[info exists ::tcl_precision] ? 86 : 85}]
} -result [expr {[info exists ::tcl_precision] ? 87 : 86}]

runtests

}

# cleanup
namespace delete ::tcl::test::encoding
::tcltest::cleanupTests
return

# Local Variables:
# mode: tcl
# End:

Changes to tests/info.test.

18
19
20
21
22
23
24
25

26
27
28
29
30
31
32
18
19
20
21
22
23
24

25
26
27
28
29
30
31
32







-
+







if {{::tcltest} ni [namespace children]} {
    package require tcltest 2.5
    namespace import -force ::tcltest::*
}
::tcltest::loadTestedCommands
catch [list package require -exact tcl::test [info patchlevel]]
testConstraint zlib [llength [info commands zlib]]

testConstraint nodep [info exists tcl_precision]
# Set up namespaces needed to test operation of "info args", "info body",
# "info default", and "info procs" with imported procedures.

catch {namespace delete test_ns_info1 test_ns_info2}

namespace eval test_ns_info1 {
    namespace export *
97
98
99
100
101
102
103
104

105
106
107
108
109
110
111
97
98
99
100
101
102
103

104
105
106
107
108
109
110
111







-
+







    }
    foo a
    eval [info body foo]
} -returnCodes error -result {can't read "args": no such variable}
# Fix for problem tested for in info-2.5 caused problems when
# procedure body had no string rep (i.e. was not yet bytecode)
# causing an empty string to be returned [Bug #545644]
test info-2.6 {info body option, returning list bodies} {
test info-2.6 {info body option, returning list bodies} nodep {
    proc foo args [list subst bar]
    list [string bytelength [info body foo]] \
	    [foo; string bytelength [info body foo]]
} {9 9}

proc testinfocmdcount {} {
    set x [info cmdcount]

Changes to tests/io.test.

2412
2413
2414
2415
2416
2417
2418
2419

2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440

2441
2442
2443
2444
2445
2446
2447
2412
2413
2414
2415
2416
2417
2418

2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439

2440
2441
2442
2443
2444
2445
2446
2447







-
+




















-
+







    after 0 [list coroutine c1 apply [list {} {
	variable done
	set chan [chan create r {apply {{cmd chan args} {
	    switch $cmd {
		blocking - finalize {
		}
		watch {
		    chan postevent $chan read 
		    chan postevent $chan read
		}
		initialize {
		    list initialize finalize watch read write configure blocking
		}
		default {
		    error [list {unexpected command} $cmd]
		}
	    }
	}}}]
	chan configure $chan -blocking 0
	while 1 {
	    chan event $chan readable [list [info coroutine]]
	    yield
	    close $chan
	    set done 1
	    return
	}
    } [namespace current]]]
    vwait [namespace current]::done
return success
} success 
} success



test io-29.1 {Tcl_WriteChars, channel not writable} {
    list [catch {puts stdin hello} msg] $msg
} {1 {channel "stdin" wasn't opened for writing}}
test io-29.2 {Tcl_WriteChars, empty string} {

Changes to tests/regexp.test.

15
16
17
18
19
20
21

22
23
24
25
26
27
28
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29







+







    package require tcltest 2.5
    namespace import -force ::tcltest::*
}

unset -nocomplain foo

testConstraint exec [llength [info commands exec]]
testConstraint nodep [info exists tcl_precision]

# Used for constraining memory leak tests
testConstraint memory [llength [info commands memory]]
if {[testConstraint memory]} {
    proc memtest script {
	set end [lindex [split [memory info] \n] 3 3]
	for {set i 0} {$i < 5} {incr i} {
761
762
763
764
765
766
767
768

769
770
771
772
773
774
775
762
763
764
765
766
767
768

769
770
771
772
773
774
775
776







-
+








test regexp-19.2 {regsub null replacement} {
    regsub -all {@} {@hel@lo@} "\0a\0" result
    set expected "\0a\0hel\0a\0lo\0a\0"
    string equal $result $expected
} 1

test regexp-20.1 {regsub shared object shimmering} -body {
test regexp-20.1 {regsub shared object shimmering} -constraints nodep -body {
    # Bug #461322
    set a abcdefghijklmnopqurstuvwxyz
    set b $a
    set c abcdefghijklmnopqurstuvwxyz0123456789
    regsub $a $c $b d
    list $d [string length $d] [string bytelength $d]
} -result [list abcdefghijklmnopqurstuvwxyz0123456789 37 37]

Changes to tests/regexpComp.test.

11
12
13
14
15
16
17


18
19
20
21
22
23
24
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26







+
+







# See the file "license.terms" for information on usage and redistribution
# of this file, and for a DISCLAIMER OF ALL WARRANTIES.

if {"::tcltest" ni [namespace children]} {
    package require tcltest 2.5
    namespace import -force ::tcltest::*
}

testConstraint nodep [info exists tcl_precision]

# Procedure to evaluate a script within a proc, to test compilation
# functionality

proc evalInProc { script } {
    proc testProc {} $script
    set status [catch {
787
788
789
790
791
792
793
794

795
796
797
798
799
800
801
789
790
791
792
793
794
795

796
797
798
799
800
801
802
803







-
+







test regexpComp-19.1 {regsub null replacement} {
    evalInProc {
	regsub -all {@} {@hel@lo@} "\0a\0" result
	list $result [string length $result]
    }
} "\0a\0hel\0a\0lo\0a\0 14"

test regexpComp-20.1 {regsub shared object shimmering} {
test regexpComp-20.1 {regsub shared object shimmering} nodep {
    evalInProc {
	# Bug #461322
	set a abcdefghijklmnopqurstuvwxyz
	set b $a
	set c abcdefghijklmnopqurstuvwxyz0123456789
	regsub $a $c $b d
	list $d [string length $d] [string bytelength $d]

Changes to tests/string.test.

29
30
31
32
33
34
35

36
37
38
39
40
41
42
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43







+







# Some tests require the testobj command

testConstraint testobj [expr {[info commands testobj] ne {}}]
testConstraint testindexobj [expr {[info commands testindexobj] ne {}}]
testConstraint testevalex [expr {[info commands testevalex] ne {}}]
testConstraint utf16 [expr {[string length \U010000] == 2}]
testConstraint testbytestring   [llength [info commands testbytestring]]
testConstraint nodep [info exists tcl_precision]

# Used for constraining memory leak tests
testConstraint memory [llength [info commands memory]]
if {[testConstraint memory]} {
    proc getbytes {} {
        set lines [split [memory info] \n]
        return [lindex $lines 3 3]
68
69
70
71
72
73
74
75

76
77

78
79
80
81
82
83
84
69
70
71
72
73
74
75

76
77

78
79
80
81
82
83
84
85







-
+

-
+







    set constraints testevalex
} else {
    interp alias {} run {} try
    set constraints {}
}


test string-1.1.$noComp {error conditions} {
test string-1.1.$noComp {error conditions} -body {
    list [catch {run {string gorp a b}} msg] $msg
} {1 {unknown or ambiguous subcommand "gorp": must be bytelength, cat, compare, equal, first, index, insert, is, last, length, map, match, range, repeat, replace, reverse, tolower, totitle, toupper, trim, trimleft, trimright, wordend, or wordstart}}
} -match glob -result {1 {unknown or ambiguous subcommand "gorp": must be *cat, compare, equal, first, index, insert, is, last, length, map, match, range, repeat, replace, reverse, tolower, totitle, toupper, trim, trimleft, trimright, wordend, or wordstart}}
test string-1.2.$noComp {error conditions} {
    list [catch {run {string}} msg] $msg
} {1 {wrong # args: should be "string subcommand ?arg ...?"}}
test stringComp-1.3.$noComp {error condition - undefined method during compile} {
    # We don't want this to complain about 'never' because it may never
    # be called, or string may get redefined.  This must compile OK.
    proc foo {str i} {
521
522
523
524
525
526
527
528

529
530
531

532
533
534
535
536
537
538
522
523
524
525
526
527
528

529
530
531

532
533
534
535
536
537
538
539







-
+


-
+







    list [catch {run {string is alpha -failin str}} msg] $msg
} {1 {wrong # args: should be "string is alpha ?-strict? ?-failindex var? str"}}
test string-6.4.$noComp {string is, too many args} {
    list [catch {run {string is alpha -failin var -strict str more}} msg] $msg
} {1 {wrong # args: should be "string is class ?-strict? ?-failindex var? str"}}
test string-6.5.$noComp {string is, class check} {
    list [catch {run {string is bogus str}} msg] $msg
} {1 {bad class "bogus": must be alnum, alpha, ascii, control, boolean, dict, digit, double, entier, false, graph, integer, list, lower, print, punct, space, true, upper, wideinteger, wordchar, or xdigit}}
} {1 {bad class "bogus": must be alnum, alpha, ascii, control, boolean, dict, digit, double, entier, false, graph, integer, list, lower, print, punct, space, true, upper, unicode, wideinteger, wordchar, or xdigit}}
test string-6.6.$noComp {string is, ambiguous class} {
    list [catch {run {string is al str}} msg] $msg
} {1 {ambiguous class "al": must be alnum, alpha, ascii, control, boolean, dict, digit, double, entier, false, graph, integer, list, lower, print, punct, space, true, upper, wideinteger, wordchar, or xdigit}}
} {1 {ambiguous class "al": must be alnum, alpha, ascii, control, boolean, dict, digit, double, entier, false, graph, integer, list, lower, print, punct, space, true, upper, unicode, wideinteger, wordchar, or xdigit}}
test string-6.7.$noComp {string is alpha, all ok} {
    run {string is alpha -strict -failindex var abc}
} 1
test string-6.8.$noComp {string is, error in var} {
    list [run {string is alpha -failindex var abc5def}] $var
} {0 3}
test string-6.9.$noComp {string is, var shouldn't get set} {
957
958
959
960
961
962
963






















964
965
966
967
968
969
970
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993







+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+







} {0 87}
test string-6.130.1.$noComp {string is entier, false on bad octal} {
    list [run {string is entier -fail var 0o1234561123412345612345656234561234561234561234561234561234561234561234561234561234536963}] $var
} {0 87}
test string-6.131.$noComp {string is entier, false on bad hex} {
    list [run {string is entier -fail var 0X12345611234123456123456562345612345612345612345612345612345612345612345612345612345345XYZ}] $var
} {0 88}
test string-6.132.$noComp {string is unicode} {
    run {string is unicode \U10FFFD\uD7FF\uE000\uFDCF\uFDF0}
} 1
test string-6.133.$noComp {string is unicode, upper surrogate} {
    run {string is unicode \uD800}
} 0
test string-6.134.$noComp {string is unicode, lower surrogate} {
    run {string is unicode \uDFFF}
} 0
test string-6.135.$noComp {string is unicode, noncharacter} {
    run {string is unicode \uFFFE}
} 0
test string-6.136.$noComp {string is unicode, noncharacter} {
    run {string is unicode \uFFFF}
} 0
test string-6.137.$noComp {string is unicode, noncharacter} {
    run {string is unicode \uFDD0}
} 0
test string-6.138.$noComp {string is unicode, noncharacter} {
    run {string is unicode \uFDEF}
} 0


test string-7.1.$noComp {string last, not enough args} {
    list [catch {run {string last a}} msg] $msg
} {1 {wrong # args: should be "string last needleString haystackString ?lastIndex?"}}
test string-7.2.$noComp {string last, bad args} {
    list [catch {run {string last a b c}} msg] $msg
} {1 {bad index "c": must be integer?[+-]integer? or end?[+-]integer?}}
1009
1010
1011
1012
1013
1014
1015
1016

1017
1018
1019

1020
1021
1022

1023
1024
1025

1026
1027
1028
1029
1030
1031
1032
1032
1033
1034
1035
1036
1037
1038

1039
1040
1041

1042
1043
1044

1045
1046
1047

1048
1049
1050
1051
1052
1053
1054
1055







-
+


-
+


-
+


-
+







test string-7.15.$noComp {string last, start index} {
    run {string last Ãœa ÃœadÃœad 0}
} -1
test string-7.16.$noComp {string last, start index} {
    run {string last Ãœa ÃœadÃœad end-1}
} 3

test string-8.1.$noComp {string bytelength} {
test string-8.1.$noComp {string bytelength} nodep {
    list [catch {run {string bytelength}} msg] $msg
} {1 {wrong # args: should be "string bytelength string"}}
test string-8.2.$noComp {string bytelength} {
test string-8.2.$noComp {string bytelength} nodep {
    list [catch {run {string bytelength a b}} msg] $msg
} {1 {wrong # args: should be "string bytelength string"}}
test string-8.3.$noComp {string bytelength} {
test string-8.3.$noComp {string bytelength} nodep {
    run {string bytelength "\xC7"}
} 2
test string-8.4.$noComp {string bytelength} {
test string-8.4.$noComp {string bytelength} nodep {
    run {string b ""}
} 0

test string-9.1.$noComp {string length} {
    list [catch {run {string length}} msg] $msg
} {1 {wrong # args: should be "string length string"}}
test string-9.2.$noComp {string length} {
1813
1814
1815
1816
1817
1818
1819
1820

1821
1822

1823
1824
1825
1826
1827
1828
1829
1836
1837
1838
1839
1840
1841
1842

1843
1844

1845
1846
1847
1848
1849
1850
1851
1852







-
+

-
+







test string-19.3.$noComp {string trimleft, unicode default} {
    run {string trimleft \uFEFF\x85\xA0\x00\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u200B\u2028\u2029\u202F\u205F\u3000\u1361ABC}
} \u1361ABC

test string-20.1.$noComp {string trimright errors} {
    list [catch {run {string trimright}} msg] $msg
} {1 {wrong # args: should be "string trimright string ?chars?"}}
test string-20.2.$noComp {string trimright errors} {
test string-20.2.$noComp {string trimright errors} -body {
    list [catch {run {string trimg a}} msg] $msg
} {1 {unknown or ambiguous subcommand "trimg": must be bytelength, cat, compare, equal, first, index, insert, is, last, length, map, match, range, repeat, replace, reverse, tolower, totitle, toupper, trim, trimleft, trimright, wordend, or wordstart}}
} -match glob -result {1 {unknown or ambiguous subcommand "trimg": must be *cat, compare, equal, first, index, insert, is, last, length, map, match, range, repeat, replace, reverse, tolower, totitle, toupper, trim, trimleft, trimright, wordend, or wordstart}}
test string-20.3.$noComp {string trimright} {
    run {string trimright "    XYZ      "}
} {    XYZ}
test string-20.4.$noComp {string trimright} {
    run {string trimright "   "}
} {}
test string-20.5.$noComp {string trimright} {
1935
1936
1937
1938
1939
1940
1941
1942

1943
1944
1945
1946
1947
1948
1949
1958
1959
1960
1961
1962
1963
1964

1965
1966
1967
1968
1969
1970
1971
1972







-
+







} "\uD83D\uDE02Hello world!\uD83D\uDE02"
test string-21.25.$noComp {string trimright, unicode} {
    run {string trimright "\uD83D\uDE02Hello world!\uD83D\uDE02" \uD93D\uDE02}
} "\uD83D\uDE02Hello world!\uD83D\uDE02"

test string-22.1.$noComp {string wordstart} -body {
    list [catch {run {string word a}} msg] $msg
} -result {1 {unknown or ambiguous subcommand "word": must be bytelength, cat, compare, equal, first, index, insert, is, last, length, map, match, range, repeat, replace, reverse, tolower, totitle, toupper, trim, trimleft, trimright, wordend, or wordstart}}
} -match glob -result {1 {unknown or ambiguous subcommand "word": must be *cat, compare, equal, first, index, insert, is, last, length, map, match, range, repeat, replace, reverse, tolower, totitle, toupper, trim, trimleft, trimright, wordend, or wordstart}}
test string-22.2.$noComp {string wordstart} -body {
    list [catch {run {string wordstart a}} msg] $msg
} -result {1 {wrong # args: should be "string wordstart string index"}}
test string-22.3.$noComp {string wordstart} -body {
    list [catch {run {string wordstart a b c}} msg] $msg
} -result {1 {wrong # args: should be "string wordstart string index"}}
test string-22.4.$noComp {string wordstart} -body {