Tcl Source Code

Check-in [96781584b9]
Login
EuroTcl/OpenACS 11 - 12 JULY 2024, VIENNA

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Changed "binary data" to "binary string". Thanks to Nathan for the rationale
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | encoding-for-review-alt
Files: files | file ages | folders
SHA3-256: 96781584b92cafa72efe37398af8ea0cf5fb440048b55a932f900693fda6bfc7
User & Date: oehhar 2024-06-17 06:26:30
Context
2024-06-17
06:26
Changed "binary data" to "binary string". Thanks to Nathan for the rationale Leaf check-in: 96781584b9 user: oehhar tags: encoding-for-review-alt
2024-06-14
15:09
Import selections of [4d6aa33b2f] (branch: encoding-for-review) and alternate wording. check-in: f5243d7263 user: oehhar tags: encoding-for-review-alt
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/encoding.n.

14
15
16
17
18
19
20
21


22

23
24
25
26
27
28
29


30
31
32
33
34
35
36
37
38
39
40
41
42
43
.BE
.SH INTRODUCTION
.PP
Strings in Tcl are a sequence of Unicode codepoints.
If strings are imported or exported from Tcl, they should be transfered to an
encoding like cp1252, iso8859-1, Shift\-JIS, utf-8, utf-16, etc.
.PP
Strings in a certain encoding are represented within Tcl as binary data


and may not be handled as Tcl strings any more.

Binary data is represented as a Tcl value composed of character values of 0 to255.
.PP
As an illustrative example, the Tcl string consisting of one character "\N'196'"
(Unicode codepoint 0xC4) may be transfered to binary data containing the string
in utf-8 encoding like this:
.CS
% set e [encoding convertto utf-8 \N'196']


% binary scan $e cucu b1 b2
%set b1
195
% set b2
132
.CE
The resulting utf-8 data is stored as binary data consisting of the two bytes 195 and 132.
.SH DESCRIPTION
.PP
Performs one of several encoding related operations, depending on
\fIoption\fR.  The legal \fIoption\fRs are:
.\" METHOD: convertfrom
.TP
\fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR







|
>
>
|
>
|


|
<


>
>






|







14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
.BE
.SH INTRODUCTION
.PP
Strings in Tcl are a sequence of Unicode codepoints.
If strings are imported or exported from Tcl, they should be transfered to an
encoding like cp1252, iso8859-1, Shift\-JIS, utf-8, utf-16, etc.
.PP
Tcl Strings which are transfered to an encoding are represented within Tcl as
binary strings, where each byte is represented by a codepoint with a value in
the range of 0 to 255.
Binary strings are typically handled using the \fBbinary\fR command.
The command \fBstring length\fR returns the required byte count which may be
different to the original character count.
.PP
As an illustrative example, the Tcl string consisting of one character "\N'196'"
(Unicode codepoint 0xC4) may be transfered to utf-8 encoding like this:

.CS
% set e [encoding convertto utf-8 \N'196']
% string length $e
2
% binary scan $e cucu b1 b2
%set b1
195
% set b2
132
.CE
The resulting utf-8 data is stored as a binary string consisting of the two bytes 195 and 132.
.SH DESCRIPTION
.PP
Performs one of several encoding related operations, depending on
\fIoption\fR.  The legal \fIoption\fRs are:
.\" METHOD: convertfrom
.TP
\fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR