Tcl Source Code

Check-in [f5243d7263]
Login
EuroTcl/OpenACS 11 - 12 JULY 2024, VIENNA

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Import selections of [4d6aa33b2f] (branch: encoding-for-review) and alternate wording.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | encoding-for-review-alt
Files: files | file ages | folders
SHA3-256: f5243d7263ce854a034111a355f6da0fa05b71d72b9c26fb47c811510330783a
User & Date: oehhar 2024-06-14 15:09:01
Context
2024-06-17
06:26
Changed "binary data" to "binary string". Thanks to Nathan for the rationale Leaf check-in: 96781584b9 user: oehhar tags: encoding-for-review-alt
2024-06-14
15:09
Import selections of [4d6aa33b2f] (branch: encoding-for-review) and alternate wording. check-in: f5243d7263 user: oehhar tags: encoding-for-review-alt
2024-06-13
12:00
Fix [1d26e580cf]: safe interp can't source files with BOM check-in: 162129dfbf user: jan.nijtmans tags: trunk, main
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/encoding.n.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

20


21
22


23
24
25








26
27
28
29
30
31
32
'\"
'\" Copyright (c) 1998 Scriptics Corporation.
'\"
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\"
.TH encoding n "8.1" Tcl "Tcl Built-In Commands"
.so man.macros
.BS
.SH NAME
encoding \- Manipulate encodings
.SH SYNOPSIS
\fBencoding \fIoption\fR ?\fIarg arg ...\fR?
.BE
.SH INTRODUCTION
.PP
Strings in Tcl are logically a sequence of Unicode characters.
These strings are represented in memory as a sequence of bytes that
may be in one of several encodings: modified UTF\-8 (which uses 1 to 4

bytes per character), or a custom encoding start as 8 bit binary data.


.PP
Different operating system interfaces or applications may generate


strings in other encodings such as Shift\-JIS.  The \fBencoding\fR
command helps to bridge the gap between Unicode and these other
formats.








.SH DESCRIPTION
.PP
Performs one of several encoding related operations, depending on
\fIoption\fR.  The legal \fIoption\fRs are:
.\" METHOD: convertfrom
.TP
\fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR










|





|
|
|
>
|
>
>

<
>
>
|
<
|
>
>
>
>
>
>
>
>







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

25
26
27

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
'\"
'\" Copyright (c) 1998 Scriptics Corporation.
'\"
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\"
.TH encoding n "8.1" Tcl "Tcl Built-In Commands"
.so man.macros
.BS
.SH NAME
encoding \- Work with encodings
.SH SYNOPSIS
\fBencoding \fIoption\fR ?\fIarg arg ...\fR?
.BE
.SH INTRODUCTION
.PP
Strings in Tcl are a sequence of Unicode codepoints.
If strings are imported or exported from Tcl, they should be transfered to an
encoding like cp1252, iso8859-1, Shift\-JIS, utf-8, utf-16, etc.
.PP
Strings in a certain encoding are represented within Tcl as binary data
and may not be handled as Tcl strings any more.
Binary data is represented as a Tcl value composed of character values of 0 to255.
.PP

As an illustrative example, the Tcl string consisting of one character "\N'196'"
(Unicode codepoint 0xC4) may be transfered to binary data containing the string
in utf-8 encoding like this:

.CS
% set e [encoding convertto utf-8 \N'196']
% binary scan $e cucu b1 b2
%set b1
195
% set b2
132
.CE
The resulting utf-8 data is stored as binary data consisting of the two bytes 195 and 132.
.SH DESCRIPTION
.PP
Performs one of several encoding related operations, depending on
\fIoption\fR.  The legal \fIoption\fRs are:
.\" METHOD: convertfrom
.TP
\fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

69
70
71
72
73
74
75
.PP
.VS "TCL8.7 TIP607, TIP656"
The \fB-profile\fR option determines the command behavior in the presence
of conversion errors. See the \fBPROFILES\fR section below for details. Any premature
termination of processing due to errors is reported through an exception if
the \fB-failindex\fR option is not specified.
.PP
If the \fB-failindex\fR is specified, instead of an exception being raised
on premature termination, the result of the conversion up to the point of the
error is returned as the result of the command. In addition, the index
of the source byte triggering the error is stored in \fBvar\fR. If no
errors are encountered, the entire result of the conversion is returned and
the value \fB-1\fR is stored in \fBvar\fR.
.VE "TCL8.7 TIP607, TIP656"
.\" METHOD: convertto
.TP
\fBencoding convertto\fR ?\fIencoding\fR? \fIdata\fR
.TP
\fBencoding convertto\fR ?\fB-profile \fIprofile\fR? ?\fB-failindex var\fR? \fIencoding data\fR
.
Convert \fIstring\fR to the specified \fIencoding\fR. The result is a Tcl binary
string that contains the sequence of bytes representing the converted string in
the specified encoding. If \fIencoding\fR is not specified, the current system
encoding is used.
.PP
.VS "TCL8.7 TIP607, TIP656"
The \fB-profile\fR and \fB-failindex\fR options have the same effect as
described for the \fBencoding convertfrom\fR command.

.VE "TCL8.7 TIP607, TIP656"
.\" METHOD: dirs
.TP
\fBencoding dirs\fR ?\fIdirectoryList\fR?
.
Tcl can load encoding data files from the file system that describe
additional encodings for it to work with. This command sets the search







|


















<
|
>







52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

78
79
80
81
82
83
84
85
86
.PP
.VS "TCL8.7 TIP607, TIP656"
The \fB-profile\fR option determines the command behavior in the presence
of conversion errors. See the \fBPROFILES\fR section below for details. Any premature
termination of processing due to errors is reported through an exception if
the \fB-failindex\fR option is not specified.
.PP
If \fB-failindex\fR is specified, instead of an exception being raised
on premature termination, the result of the conversion up to the point of the
error is returned as the result of the command. In addition, the index
of the source byte triggering the error is stored in \fBvar\fR. If no
errors are encountered, the entire result of the conversion is returned and
the value \fB-1\fR is stored in \fBvar\fR.
.VE "TCL8.7 TIP607, TIP656"
.\" METHOD: convertto
.TP
\fBencoding convertto\fR ?\fIencoding\fR? \fIdata\fR
.TP
\fBencoding convertto\fR ?\fB-profile \fIprofile\fR? ?\fB-failindex var\fR? \fIencoding data\fR
.
Convert \fIstring\fR to the specified \fIencoding\fR. The result is a Tcl binary
string that contains the sequence of bytes representing the converted string in
the specified encoding. If \fIencoding\fR is not specified, the current system
encoding is used.
.PP
.VS "TCL8.7 TIP607, TIP656"

See \fBencoding convertfrom\fR for the meaning of \fB\-profile\fR and
\fB\-failindex\fR.
.VE "TCL8.7 TIP607, TIP656"
.\" METHOD: dirs
.TP
\fBencoding dirs\fR ?\fIdirectoryList\fR?
.
Tcl can load encoding data files from the file system that describe
additional encodings for it to work with. This command sets the search
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
.VS "TCL8.7 TIP656"
Returns a list of the names of encoding profiles. See \fBPROFILES\fR below.
.VE "TCL8.7 TIP656"
.\" METHOD: system
.TP
\fBencoding system\fR ?\fIencoding\fR?
.
Set the system encoding to \fIencoding\fR. If \fIencoding\fR is
omitted then the command returns the current system encoding.  The
system encoding is used whenever Tcl passes strings to system calls.
.\" Do not put .VS on whole section as that messes up the bullet list alignment
.SH PROFILES
.PP
.VS "TCL8.7 TIP656"
Operations involving encoding transforms may encounter several types of
errors such as invalid sequences in the source data, characters that







|
|







108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
.VS "TCL8.7 TIP656"
Returns a list of the names of encoding profiles. See \fBPROFILES\fR below.
.VE "TCL8.7 TIP656"
.\" METHOD: system
.TP
\fBencoding system\fR ?\fIencoding\fR?
.
Set the system encoding to \fIencoding\fR. If \fIencoding\fR is not given,
returns the current system encoding.  The
system encoding is used whenever Tcl passes strings to system calls.
.\" Do not put .VS on whole section as that messes up the bullet list alignment
.SH PROFILES
.PP
.VS "TCL8.7 TIP656"
Operations involving encoding transforms may encounter several types of
errors such as invalid sequences in the source data, characters that
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
Example 1: convert a byte sequence in Japanese euc-jp encoding to a TCL string:
.PP
.CS
% codepoints [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"]
U+00306F
.CE
.PP
The result is the unicode codepoint
.QW "\eu306F" ,
which is the Hiragana letter HA.
.VS "TCL8.7 TIP607, TIP656"
.PP
Example 2: Error handling based on profiles:
.PP
The letter \fBA\fR is Unicode character U+0041 and the byte "\ex80" is invalid







|







199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
Example 1: convert a byte sequence in Japanese euc-jp encoding to a TCL string:
.PP
.CS
% codepoints [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"]
U+00306F
.CE
.PP
The result is the Unicode codepoint
.QW "\eu306F" ,
which is the Hiragana letter HA.
.VS "TCL8.7 TIP607, TIP656"
.PP
Example 2: Error handling based on profiles:
.PP
The letter \fBA\fR is Unicode character U+0041 and the byte "\ex80" is invalid