Tcl Source Code

Check-in [19890efeab]
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Merge 8.7
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | bug-bd94500678
Files: files | file ages | folders
SHA3-256: 19890efeab49199c87672914d36bb91cb509cb7b411159671385efd666b9e4e9
User & Date: jan.nijtmans 2019-02-19 20:21:32
Context
2019-02-25
17:06
merge 8.7 check-in: 6bda8e2cf5 user: dgp tags: bug-bd94500678
2019-02-19
20:21
Merge 8.7 check-in: 19890efeab user: jan.nijtmans tags: bug-bd94500678
20:16
Merge 8.6 check-in: 1b17625b60 user: jan.nijtmans tags: core-8-branch
19:38
Minor optimizations check-in: 0e2621fc4b user: jan.nijtmans tags: bug-bd94500678
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/Utf.3.

128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
sequence consists of a lead byte followed by some number of trail bytes.
.PP
\fBTCL_UTF_MAX\fR is the maximum number of bytes that it takes to
represent one Unicode character in the UTF-8 representation.
.PP
\fBTcl_UniCharToUtf\fR stores the character \fIch\fR as a UTF-8 string
in starting at \fIbuf\fR.  The return value is the number of bytes stored
in \fIbuf\fR. If ch is an upper surrogate (range U+D800 - U+DBFF), then
the return value will be 0 and nothing will be stored. If you still
want to produce UTF-8 output for it (even though knowing it's an illegal
code-point on its own), just call \fBTcl_UniCharToUtf\fR again using ch = -1.
.PP
\fBTcl_UtfToUniChar\fR reads one UTF-8 character starting at \fIsrc\fR
and stores it as a Tcl_UniChar in \fI*chPtr\fR.  The return value is the
number of bytes read from \fIsrc\fR.  The caller must ensure that the






|







128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
sequence consists of a lead byte followed by some number of trail bytes.
.PP
\fBTCL_UTF_MAX\fR is the maximum number of bytes that it takes to
represent one Unicode character in the UTF-8 representation.
.PP
\fBTcl_UniCharToUtf\fR stores the character \fIch\fR as a UTF-8 string
in starting at \fIbuf\fR.  The return value is the number of bytes stored
in \fIbuf\fR. If ch is a high surrogate (range U+D800 - U+DBFF), then
the return value will be 0 and nothing will be stored. If you still
want to produce UTF-8 output for it (even though knowing it's an illegal
code-point on its own), just call \fBTcl_UniCharToUtf\fR again using ch = -1.
.PP
\fBTcl_UtfToUniChar\fR reads one UTF-8 character starting at \fIsrc\fR
and stores it as a Tcl_UniChar in \fI*chPtr\fR.  The return value is the
number of bytes read from \fIsrc\fR.  The caller must ensure that the

Changes to tests/utf.test.

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
...
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
} 1
test utf-1.6 {Tcl_UniCharToUtf: negative Tcl_UniChar} testbytestring {
    expr {[format %c -1] eq [testbytestring "\xef\xbf\xbd"]}
} 1
test utf-1.7 {Tcl_UniCharToUtf: 4 byte sequences} -constraints testbytestring -body {
    expr {"\U014e4e" eq [testbytestring "\xf0\x94\xb9\x8e"]}
} -result 1
test utf-1.8 {Tcl_UniCharToUtf: 3 byte sequence, upper surrogate} testbytestring {
    expr {"\ud842" eq [testbytestring "\xed\xa1\x82"]}
} 1
test utf-1.9 {Tcl_UniCharToUtf: 3 byte sequence, lower surrogate} testbytestring {
    expr {"\udc42" eq [testbytestring "\xed\xb1\x82"]}
} 1
test utf-1.10 {Tcl_UniCharToUtf: 3 byte sequence, upper surrogate} testbytestring {
    expr {[format %c 0xd842] eq [testbytestring "\xed\xa1\x82"]}
} 1
test utf-1.11 {Tcl_UniCharToUtf: 3 byte sequence, lower surrogate} testbytestring {
    expr {[format %c 0xdc42] eq [testbytestring "\xed\xb1\x82"]}
} 1

test utf-2.1 {Tcl_UtfToUniChar: low ascii} {
    string length "abc"
} {3}
test utf-2.2 {Tcl_UtfToUniChar: naked trail bytes} testbytestring {
................................................................................
} "\u4e4e"
test utf-8.3 {Tcl_UniCharAtIndex: index > 0} {
    string index abcd 2
} {c}
test utf-8.4 {Tcl_UniCharAtIndex: index > 0} {
    string index \u4e4e\u25a\xff\u543 2
} "\uff"
test utf-8.5 {Tcl_UniCharAtIndex: upper surrogate} {
    string index \ud842 0
} "\ud842"
test utf-8.5 {Tcl_UniCharAtIndex: lower surrogate} {
    string index \udc42 0
} "\udc42"

test utf-9.1 {Tcl_UtfAtIndex: index = 0} {
    string range abcd 0 2
} {abc}
test utf-9.2 {Tcl_UtfAtIndex: index > 0} {






|


|


|


|







 







|


|







40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
...
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
} 1
test utf-1.6 {Tcl_UniCharToUtf: negative Tcl_UniChar} testbytestring {
    expr {[format %c -1] eq [testbytestring "\xef\xbf\xbd"]}
} 1
test utf-1.7 {Tcl_UniCharToUtf: 4 byte sequences} -constraints testbytestring -body {
    expr {"\U014e4e" eq [testbytestring "\xf0\x94\xb9\x8e"]}
} -result 1
test utf-1.8 {Tcl_UniCharToUtf: 3 byte sequence, high surrogate} testbytestring {
    expr {"\ud842" eq [testbytestring "\xed\xa1\x82"]}
} 1
test utf-1.9 {Tcl_UniCharToUtf: 3 byte sequence, low surrogate} testbytestring {
    expr {"\udc42" eq [testbytestring "\xed\xb1\x82"]}
} 1
test utf-1.10 {Tcl_UniCharToUtf: 3 byte sequence, high surrogate} testbytestring {
    expr {[format %c 0xd842] eq [testbytestring "\xed\xa1\x82"]}
} 1
test utf-1.11 {Tcl_UniCharToUtf: 3 byte sequence, low surrogate} testbytestring {
    expr {[format %c 0xdc42] eq [testbytestring "\xed\xb1\x82"]}
} 1

test utf-2.1 {Tcl_UtfToUniChar: low ascii} {
    string length "abc"
} {3}
test utf-2.2 {Tcl_UtfToUniChar: naked trail bytes} testbytestring {
................................................................................
} "\u4e4e"
test utf-8.3 {Tcl_UniCharAtIndex: index > 0} {
    string index abcd 2
} {c}
test utf-8.4 {Tcl_UniCharAtIndex: index > 0} {
    string index \u4e4e\u25a\xff\u543 2
} "\uff"
test utf-8.5 {Tcl_UniCharAtIndex: high surrogate} {
    string index \ud842 0
} "\ud842"
test utf-8.5 {Tcl_UniCharAtIndex: low surrogate} {
    string index \udc42 0
} "\udc42"

test utf-9.1 {Tcl_UtfAtIndex: index = 0} {
    string range abcd 0 2
} {abc}
test utf-9.2 {Tcl_UtfAtIndex: index > 0} {