Check-in [ff310c67f2]

Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:WIP
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: ff310c67f2cab7d1f4533ef1035105a3a6aa987f640c867fcdb857f7281679a1
User & Date: dgp 2020-03-06 12:55:11.860
Context
2020-03-06
16:09
New TIP 569; update index. check-in: 61c2552306 user: dgp tags: trunk
12:55
WIP check-in: ff310c67f2 user: dgp tags: trunk
02:39
WIP check-in: c3f8a1f42a user: dgp tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to tip/568.md.
78
79
80
81
82
83
84






































85
86
87
88
89
90
91
Again it must be understood that all of these routines have value only
so far as they produce a *foo* value that caller can use in the place
of operating directly on the *Tcl_Obj*.  Experience has proven that
the existing specification of the routine **Tcl_GetByteArrayFromObj**
fails that test.

# History and Rationale







































# Specification

# Compatibility

# Scope








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
Again it must be understood that all of these routines have value only
so far as they produce a *foo* value that caller can use in the place
of operating directly on the *Tcl_Obj*.  Experience has proven that
the existing specification of the routine **Tcl_GetByteArrayFromObj**
fails that test.

# History and Rationale

Starting with release 8.1, Tcl string values have been composed of
characters from an international character set. Each string is conceived
as a sequence of characters from UCS-2.  Each routine accepting a string
as a *char* array, and the string representation of each *Tcl_Obj* expect
to store a UCS-2 sequence in a Modified UTF-8 encoding.  This change
created the need for a new mechanism to accept, transmit, store, and
produce arbitrary binary values, preferably while minimizing the need to
convert to other representations.

The _bytearray_ **Tcl_ObjType** was created to address this need. The
routine **Tcl_NewByteArrayObj** stores an arbitrary byte sequence in
a *Tcl_Obj*.  The routine **Tcl_GetByteArrayFromObj** can then retrieve
that same sequence.  

When the string representation of the value is
needed each byte (with value from 0-255) in the sequence is treated
as the corresponding UCS-2 codepoint (U+0000 - U+00FF), and that
UCS-2 sequence is encoded in Modified UTF-8 in the usual way.  This
strategy permits all byte sequences to be encoded in a subset of
Tcl string values. 

When a byte sequence is not present, but one is
needed, it may be generated from the string representation. When
the string is one from the subset produced by encoding byte sequences,
the decoding is clear, but what about other string values, those that
contain at least one codepoint greater than U+00FF?  It was decided
that any larger codepoint in the string value would have its high
bits stripped away, and be decoded based on the low 8 bits it contained.
Given this decision, all strings produce a byte sequence, and
**Tcl_GetByteArrayFromObj** could be defined using the patterns that
never produces errors.  This decision is the source of all the trouble.







# Specification

# Compatibility

# Scope