Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Import selections of [4d6aa33b2f] (branch: encoding-for-review) and alternate wording. |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | encoding-for-review-alt |
Files: | files | file ages | folders |
SHA3-256: |
f5243d7263ce854a034111a355f6da0f |
User & Date: | oehhar 2024-06-14 15:09:01 |
Context
2024-06-17
| ||
06:26 | Changed "binary data" to "binary string". Thanks to Nathan for the rationale Leaf check-in: 96781584b9 user: oehhar tags: encoding-for-review-alt | |
2024-06-14
| ||
15:09 | Import selections of [4d6aa33b2f] (branch: encoding-for-review) and alternate wording. check-in: f5243d7263 user: oehhar tags: encoding-for-review-alt | |
2024-06-13
| ||
12:00 | Fix [1d26e580cf]: safe interp can't source files with BOM check-in: 162129dfbf user: jan.nijtmans tags: trunk, main | |
Changes
Changes to doc/encoding.n.
1 2 3 4 5 6 7 8 9 10 | '\" '\" Copyright (c) 1998 Scriptics Corporation. '\" '\" See the file "license.terms" for information on usage and redistribution '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. '\" .TH encoding n "8.1" Tcl "Tcl Built-In Commands" .so man.macros .BS .SH NAME | | | | | > | > > < > > | < | > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | '\" '\" Copyright (c) 1998 Scriptics Corporation. '\" '\" See the file "license.terms" for information on usage and redistribution '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. '\" .TH encoding n "8.1" Tcl "Tcl Built-In Commands" .so man.macros .BS .SH NAME encoding \- Work with encodings .SH SYNOPSIS \fBencoding \fIoption\fR ?\fIarg arg ...\fR? .BE .SH INTRODUCTION .PP Strings in Tcl are a sequence of Unicode codepoints. If strings are imported or exported from Tcl, they should be transfered to an encoding like cp1252, iso8859-1, Shift\-JIS, utf-8, utf-16, etc. .PP Strings in a certain encoding are represented within Tcl as binary data and may not be handled as Tcl strings any more. Binary data is represented as a Tcl value composed of character values of 0 to255. .PP As an illustrative example, the Tcl string consisting of one character "\N'196'" (Unicode codepoint 0xC4) may be transfered to binary data containing the string in utf-8 encoding like this: .CS % set e [encoding convertto utf-8 \N'196'] % binary scan $e cucu b1 b2 %set b1 195 % set b2 132 .CE The resulting utf-8 data is stored as binary data consisting of the two bytes 195 and 132. .SH DESCRIPTION .PP Performs one of several encoding related operations, depending on \fIoption\fR. The legal \fIoption\fRs are: .\" METHOD: convertfrom .TP \fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR |
︙ | ︙ | |||
41 42 43 44 45 46 47 | .PP .VS "TCL8.7 TIP607, TIP656" The \fB-profile\fR option determines the command behavior in the presence of conversion errors. See the \fBPROFILES\fR section below for details. Any premature termination of processing due to errors is reported through an exception if the \fB-failindex\fR option is not specified. .PP | | < | > | 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | .PP .VS "TCL8.7 TIP607, TIP656" The \fB-profile\fR option determines the command behavior in the presence of conversion errors. See the \fBPROFILES\fR section below for details. Any premature termination of processing due to errors is reported through an exception if the \fB-failindex\fR option is not specified. .PP If \fB-failindex\fR is specified, instead of an exception being raised on premature termination, the result of the conversion up to the point of the error is returned as the result of the command. In addition, the index of the source byte triggering the error is stored in \fBvar\fR. If no errors are encountered, the entire result of the conversion is returned and the value \fB-1\fR is stored in \fBvar\fR. .VE "TCL8.7 TIP607, TIP656" .\" METHOD: convertto .TP \fBencoding convertto\fR ?\fIencoding\fR? \fIdata\fR .TP \fBencoding convertto\fR ?\fB-profile \fIprofile\fR? ?\fB-failindex var\fR? \fIencoding data\fR . Convert \fIstring\fR to the specified \fIencoding\fR. The result is a Tcl binary string that contains the sequence of bytes representing the converted string in the specified encoding. If \fIencoding\fR is not specified, the current system encoding is used. .PP .VS "TCL8.7 TIP607, TIP656" See \fBencoding convertfrom\fR for the meaning of \fB\-profile\fR and \fB\-failindex\fR. .VE "TCL8.7 TIP607, TIP656" .\" METHOD: dirs .TP \fBencoding dirs\fR ?\fIdirectoryList\fR? . Tcl can load encoding data files from the file system that describe additional encodings for it to work with. This command sets the search |
︙ | ︙ | |||
97 98 99 100 101 102 103 | .VS "TCL8.7 TIP656" Returns a list of the names of encoding profiles. See \fBPROFILES\fR below. .VE "TCL8.7 TIP656" .\" METHOD: system .TP \fBencoding system\fR ?\fIencoding\fR? . | | | | 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | .VS "TCL8.7 TIP656" Returns a list of the names of encoding profiles. See \fBPROFILES\fR below. .VE "TCL8.7 TIP656" .\" METHOD: system .TP \fBencoding system\fR ?\fIencoding\fR? . Set the system encoding to \fIencoding\fR. If \fIencoding\fR is not given, returns the current system encoding. The system encoding is used whenever Tcl passes strings to system calls. .\" Do not put .VS on whole section as that messes up the bullet list alignment .SH PROFILES .PP .VS "TCL8.7 TIP656" Operations involving encoding transforms may encounter several types of errors such as invalid sequences in the source data, characters that |
︙ | ︙ | |||
188 189 190 191 192 193 194 | Example 1: convert a byte sequence in Japanese euc-jp encoding to a TCL string: .PP .CS % codepoints [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"] U+00306F .CE .PP | | | 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | Example 1: convert a byte sequence in Japanese euc-jp encoding to a TCL string: .PP .CS % codepoints [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"] U+00306F .CE .PP The result is the Unicode codepoint .QW "\eu306F" , which is the Hiragana letter HA. .VS "TCL8.7 TIP607, TIP656" .PP Example 2: Error handling based on profiles: .PP The letter \fBA\fR is Unicode character U+0041 and the byte "\ex80" is invalid |
︙ | ︙ |