Tcl Source Code

Check-in [8976a447aa]
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:merge core-8-6-branch
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | core-8-branch
Files: files | file ages | folders
SHA3-256: 8976a447aaea7082692b10ef6b6b6e7d08d75a9961f1ef9c08f41b69fc42a42e
User & Date: jan.nijtmans 2017-11-29 11:05:15
Context
2017-11-29
12:01
Fix for [6bca38d59b], TclOO segmentation fault cleaning up objects that that have mixed themselves i... check-in: 313a7c1b8a user: pooryorick tags: core-8-branch
11:49
Merge core-8-branch. Also, use a different value for TCL_STUB_MAGIC when TCL_UTF_MAX>4. check-in: 1916b6a72e user: jan.nijtmans tags: tip-389
11:05
merge core-8-6-branch check-in: 8976a447aa user: jan.nijtmans tags: core-8-branch
11:04
Update some functions in tclUtf.c to handle surrogate pairs when TCL_UTF_MAX == 4. Also update docum... check-in: 83c0c569d6 user: jan.nijtmans tags: core-8-6-branch
2017-11-28
23:33
Fix for issue [6cf568a21b]: Tcl_Eval() causes new segfault (TclOO object creation by qualified name)... check-in: 93c437ef37 user: pooryorick tags: core-8-branch
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/ToUpper.3.

29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
\fBTcl_UtfToLower\fR(\fIstr\fR)
.sp
int
\fBTcl_UtfToTitle\fR(\fIstr\fR)
.SH ARGUMENTS
.AS char *str in/out
.AP int ch in
The Tcl_UniChar to be converted.
.AP char *str in/out
Pointer to UTF-8 string to be converted in place.
.BE

.SH DESCRIPTION
.PP
The first three routines convert the case of individual Unicode characters:






|







29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
\fBTcl_UtfToLower\fR(\fIstr\fR)
.sp
int
\fBTcl_UtfToTitle\fR(\fIstr\fR)
.SH ARGUMENTS
.AS char *str in/out
.AP int ch in
The Unicode character to be converted.
.AP char *str in/out
Pointer to UTF-8 string to be converted in place.
.BE

.SH DESCRIPTION
.PP
The first three routines convert the case of individual Unicode characters:

Changes to doc/UniCharIsAlpha.3.

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
.AS int ch
.AP int ch in
The Tcl_UniChar to be examined.
.BE

.SH DESCRIPTION
.PP
All of the routines described examine Tcl_UniChars and return a
boolean value. A non-zero return value means that the character does
belong to the character class associated with the called routine. The
rest of this document just describes the character classes associated
with the various routines.
.PP
Note: A Tcl_UniChar is a Unicode character represented as an unsigned,
fixed-size quantity.

.SH "CHARACTER CLASSES"
.PP
\fBTcl_UniCharIsAlnum\fR tests if the character is an alphanumeric Unicode character.
.PP
\fBTcl_UniCharIsAlpha\fR tests if the character is an alphabetic Unicode character.
.PP






|




<
<
<







49
50
51
52
53
54
55
56
57
58
59
60



61
62
63
64
65
66
67
.AS int ch
.AP int ch in
The Tcl_UniChar to be examined.
.BE

.SH DESCRIPTION
.PP
All of the routines described examine Unicode characters and return a
boolean value. A non-zero return value means that the character does
belong to the character class associated with the called routine. The
rest of this document just describes the character classes associated
with the various routines.




.SH "CHARACTER CLASSES"
.PP
\fBTcl_UniCharIsAlnum\fR tests if the character is an alphanumeric Unicode character.
.PP
\fBTcl_UniCharIsAlpha\fR tests if the character is an alphabetic Unicode character.
.PP

Changes to doc/Utf.3.

73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
\fBTcl_UtfBackslash\fR(\fIsrc, readPtr, dst\fR)
.SH ARGUMENTS
.AS "const Tcl_UniChar" *uniPattern in/out
.AP char *buf out
Buffer in which the UTF-8 representation of the Tcl_UniChar is stored.  At most
\fBTCL_UTF_MAX\fR bytes are stored in the buffer.
.AP int ch in
The Tcl_UniChar to be converted or examined.
.AP Tcl_UniChar *chPtr out
Filled with the Tcl_UniChar represented by the head of the UTF-8 string.
.AP "const char" *src in
Pointer to a UTF-8 string.
.AP "const char" *cs in
Pointer to a UTF-8 string.
.AP "const char" *ct in






|







73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
\fBTcl_UtfBackslash\fR(\fIsrc, readPtr, dst\fR)
.SH ARGUMENTS
.AS "const Tcl_UniChar" *uniPattern in/out
.AP char *buf out
Buffer in which the UTF-8 representation of the Tcl_UniChar is stored.  At most
\fBTCL_UTF_MAX\fR bytes are stored in the buffer.
.AP int ch in
The Unicode character to be converted or examined.
.AP Tcl_UniChar *chPtr out
Filled with the Tcl_UniChar represented by the head of the UTF-8 string.
.AP "const char" *src in
Pointer to a UTF-8 string.
.AP "const char" *cs in
Pointer to a UTF-8 string.
.AP "const char" *ct in

Changes to generic/tclUtf.c.

394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
...
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548







549
550
551
552
553
554
555
556
...
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
...
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589







590
591
592
593
594
595
596
597
....
1048
1049
1050
1051
1052
1053
1054









1055
1056
1057
1058
1059
1060
1061
....
1080
1081
1082
1083
1084
1085
1086

1087
1088
1089
1090
1091
1092
1093
1094









1095
1096
1097
1098
1099
1100
1101
....
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
....
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131

1132
1133









1134
1135
1136
1137
1138
1139
1140
    const char *src,		/* UTF-8 string to convert to Unicode. */
    int length,			/* Length of UTF-8 string in bytes, or -1 for
				 * strlen(). */
    Tcl_DString *dsPtr)		/* Unicode representation of string is
				 * appended to this previously initialized
				 * DString. */
{
    Tcl_UniChar ch, *w, *wString;
    const char *p, *end;
    int oldLength;

    if (length < 0) {
	length = strlen(src);
    }

................................................................................
}
 
/*
 *---------------------------------------------------------------------------
 *
 * Tcl_UtfFindFirst --
 *
 *	Returns a pointer to the first occurance of the given Tcl_UniChar in
 *	the NULL-terminated UTF-8 string. The NULL terminator is considered
 *	part of the UTF-8 string. Equivalent to Plan 9 utfrune().
 *
 * Results:
 *	As above. If the Tcl_UniChar does not exist in the given string, the
 *	return value is NULL.
 *
 * Side effects:
 *	None.
 *
 *---------------------------------------------------------------------------
 */

const char *
Tcl_UtfFindFirst(
    const char *src,		/* The UTF-8 string to be searched. */
    int ch)			/* The Tcl_UniChar to search for. */
{
    int len;
    Tcl_UniChar find = 0;

    while (1) {
	len = TclUtfToUniChar(src, &find);







	if (find == ch) {
	    return src;
	}
	if (*src == '\0') {
	    return NULL;
	}
	src += len;
    }
................................................................................
}
 
/*
 *---------------------------------------------------------------------------
 *
 * Tcl_UtfFindLast --
 *
 *	Returns a pointer to the last occurance of the given Tcl_UniChar in
 *	the NULL-terminated UTF-8 string. The NULL terminator is considered
 *	part of the UTF-8 string. Equivalent to Plan 9 utfrrune().
 *
 * Results:
 *	As above. If the Tcl_UniChar does not exist in the given string, the
 *	return value is NULL.
 *
 * Side effects:
................................................................................
 *
 *---------------------------------------------------------------------------
 */

const char *
Tcl_UtfFindLast(
    const char *src,		/* The UTF-8 string to be searched. */
    int ch)			/* The Tcl_UniChar to search for. */
{
    int len;
    Tcl_UniChar find = 0;
    const char *last;

    last = NULL;
    while (1) {
	len = TclUtfToUniChar(src, &find);







	if (find == ch) {
	    last = src;
	}
	if (*src == '\0') {
	    break;
	}
	src += len;
    }
................................................................................
	 * n must be interpreted as chars, not bytes. This should be called
	 * only when both strings are of at least n chars long (no need for \0
	 * check)
	 */

	cs += TclUtfToUniChar(cs, &ch1);
	ct += TclUtfToUniChar(ct, &ch2);









	if (ch1 != ch2) {
	    return (ch1 - ch2);
	}
    }
    return 0;
}
 
................................................................................
int
Tcl_UtfNcasecmp(
    const char *cs,		/* UTF string to compare to ct. */
    const char *ct,		/* UTF string cs is compared to. */
    unsigned long numChars)	/* Number of UTF chars to compare. */
{
    Tcl_UniChar ch1 = 0, ch2 = 0;

    while (numChars-- > 0) {
	/*
	 * n must be interpreted as chars, not bytes.
	 * This should be called only when both strings are of
	 * at least n chars long (no need for \0 check)
	 */
	cs += TclUtfToUniChar(cs, &ch1);
	ct += TclUtfToUniChar(ct, &ch2);









	if (ch1 != ch2) {
	    ch1 = Tcl_UniCharToLower(ch1);
	    ch2 = Tcl_UniCharToLower(ch2);
	    if (ch1 != ch2) {
		return (ch1 - ch2);
	    }
	}
................................................................................
    }
    return 0;
}
 
/*
 *----------------------------------------------------------------------
 *
 * Tcl_UtfNcasecmp --
 *
 *	Compare UTF chars of string cs to string ct case insensitively.
 *	Replacement for strcasecmp in Tcl core, in places where UTF-8 should
 *	be handled.
 *
 * Results:
 *	Return <0 if cs < ct, 0 if cs == ct, or >0 if cs > ct.
................................................................................
 */

int
TclUtfCasecmp(
    const char *cs,		/* UTF string to compare to ct. */
    const char *ct)		/* UTF string cs is compared to. */
{
    while (*cs && *ct) {
	Tcl_UniChar ch1, ch2;


	cs += TclUtfToUniChar(cs, &ch1);
	ct += TclUtfToUniChar(ct, &ch2);









	if (ch1 != ch2) {
	    ch1 = Tcl_UniCharToLower(ch1);
	    ch2 = Tcl_UniCharToLower(ch2);
	    if (ch1 != ch2) {
		return ch1 - ch2;
	    }
	}






|







 







|
|



|
|










|

|




>
>
>
>
>
>
>
|







 







|
|







 







|

|






>
>
>
>
>
>
>
|







 







>
>
>
>
>
>
>
>
>







 







>








>
>
>
>
>
>
>
>
>







 







|







 







<
|

>


>
>
>
>
>
>
>
>
>







394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
...
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
...
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
...
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
....
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
....
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
....
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
....
1155
1156
1157
1158
1159
1160
1161

1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
    const char *src,		/* UTF-8 string to convert to Unicode. */
    int length,			/* Length of UTF-8 string in bytes, or -1 for
				 * strlen(). */
    Tcl_DString *dsPtr)		/* Unicode representation of string is
				 * appended to this previously initialized
				 * DString. */
{
    Tcl_UniChar ch = 0, *w, *wString;
    const char *p, *end;
    int oldLength;

    if (length < 0) {
	length = strlen(src);
    }

................................................................................
}
 
/*
 *---------------------------------------------------------------------------
 *
 * Tcl_UtfFindFirst --
 *
 *	Returns a pointer to the first occurance of the given Unicode character
 *	in the NULL-terminated UTF-8 string. The NULL terminator is considered
 *	part of the UTF-8 string. Equivalent to Plan 9 utfrune().
 *
 * Results:
 *	As above. If the Unicode character does not exist in the given string,
 *	the return value is NULL.
 *
 * Side effects:
 *	None.
 *
 *---------------------------------------------------------------------------
 */

const char *
Tcl_UtfFindFirst(
    const char *src,		/* The UTF-8 string to be searched. */
    int ch)			/* The Unicode character to search for. */
{
    int len, fullchar;
    Tcl_UniChar find = 0;

    while (1) {
	len = TclUtfToUniChar(src, &find);
	fullchar = find;
#if TCL_UTF_MAX == 4
	if (!len) {
	    len += TclUtfToUniChar(src, &find);
	    fullchar = (((fullchar & 0x3ff) << 10) | (ch & 0x3ff)) + 0x10000;
	}
#endif
	if (find == fullchar) {
	    return src;
	}
	if (*src == '\0') {
	    return NULL;
	}
	src += len;
    }
................................................................................
}
 
/*
 *---------------------------------------------------------------------------
 *
 * Tcl_UtfFindLast --
 *
 *	Returns a pointer to the last occurance of the given Unicode character
 *	in the NULL-terminated UTF-8 string. The NULL terminator is considered
 *	part of the UTF-8 string. Equivalent to Plan 9 utfrrune().
 *
 * Results:
 *	As above. If the Tcl_UniChar does not exist in the given string, the
 *	return value is NULL.
 *
 * Side effects:
................................................................................
 *
 *---------------------------------------------------------------------------
 */

const char *
Tcl_UtfFindLast(
    const char *src,		/* The UTF-8 string to be searched. */
    int ch)			/* The Unicode character to search for. */
{
    int len, fullchar;
    Tcl_UniChar find = 0;
    const char *last;

    last = NULL;
    while (1) {
	len = TclUtfToUniChar(src, &find);
	fullchar = find;
#if TCL_UTF_MAX == 4
	if (!len) {
	    len += TclUtfToUniChar(src, &find);
	    fullchar = (((fullchar & 0x3ff) << 10) | (ch & 0x3ff)) + 0x10000;
	}
#endif
	if (find == fullchar) {
	    last = src;
	}
	if (*src == '\0') {
	    break;
	}
	src += len;
    }
................................................................................
	 * n must be interpreted as chars, not bytes. This should be called
	 * only when both strings are of at least n chars long (no need for \0
	 * check)
	 */

	cs += TclUtfToUniChar(cs, &ch1);
	ct += TclUtfToUniChar(ct, &ch2);
#if TCL_UTF_MAX == 4
    /* map high surrogate characters to values > 0xffff */
    if ((ch1 & 0xFC00) == 0xD800) {
	ch1 += 0x4000;
    }
    if ((ch2 & 0xFC00) == 0xD800) {
	ch2 += 0x4000;
    }
#endif
	if (ch1 != ch2) {
	    return (ch1 - ch2);
	}
    }
    return 0;
}
 
................................................................................
int
Tcl_UtfNcasecmp(
    const char *cs,		/* UTF string to compare to ct. */
    const char *ct,		/* UTF string cs is compared to. */
    unsigned long numChars)	/* Number of UTF chars to compare. */
{
    Tcl_UniChar ch1 = 0, ch2 = 0;

    while (numChars-- > 0) {
	/*
	 * n must be interpreted as chars, not bytes.
	 * This should be called only when both strings are of
	 * at least n chars long (no need for \0 check)
	 */
	cs += TclUtfToUniChar(cs, &ch1);
	ct += TclUtfToUniChar(ct, &ch2);
#if TCL_UTF_MAX == 4
    /* map high surrogate characters to values > 0xffff */
    if ((ch1 & 0xFC00) == 0xD800) {
	ch1 += 0x4000;
    }
    if ((ch2 & 0xFC00) == 0xD800) {
	ch2 += 0x4000;
    }
#endif
	if (ch1 != ch2) {
	    ch1 = Tcl_UniCharToLower(ch1);
	    ch2 = Tcl_UniCharToLower(ch2);
	    if (ch1 != ch2) {
		return (ch1 - ch2);
	    }
	}
................................................................................
    }
    return 0;
}
 
/*
 *----------------------------------------------------------------------
 *
 * TclUtfCasecmp --
 *
 *	Compare UTF chars of string cs to string ct case insensitively.
 *	Replacement for strcasecmp in Tcl core, in places where UTF-8 should
 *	be handled.
 *
 * Results:
 *	Return <0 if cs < ct, 0 if cs == ct, or >0 if cs > ct.
................................................................................
 */

int
TclUtfCasecmp(
    const char *cs,		/* UTF string to compare to ct. */
    const char *ct)		/* UTF string cs is compared to. */
{

    Tcl_UniChar ch1 = 0, ch2 = 0;

    while (*cs && *ct) {
	cs += TclUtfToUniChar(cs, &ch1);
	ct += TclUtfToUniChar(ct, &ch2);
#if TCL_UTF_MAX == 4
    /* map high surrogate characters to values > 0xffff */
    if ((ch1 & 0xFC00) == 0xD800) {
	ch1 += 0x4000;
    }
    if ((ch2 & 0xFC00) == 0xD800) {
	ch2 += 0x4000;
    }
#endif
	if (ch1 != ch2) {
	    ch1 = Tcl_UniCharToLower(ch1);
	    ch2 = Tcl_UniCharToLower(ch2);
	    if (ch1 != ch2) {
		return ch1 - ch2;
	    }
	}