Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | WIP on TIP 568 |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
6aac90669d074379adf6a69d216f5464 |
User & Date: | dgp 2020-03-09 23:37:04.427 |
Context
2020-03-09
| ||
23:51 | edit check-in: a93193cf6f user: dgp tags: trunk | |
23:37 | WIP on TIP 568 check-in: 6aac90669d user: dgp tags: trunk | |
2020-03-06
| ||
16:25 | CFV on TIP 569 check-in: 807759b05d user: dgp tags: trunk | |
Changes
Changes to tip/568.md.
︙ | ︙ | |||
37 38 39 40 41 42 43 | The _int_ return allows for returning **TCL_ERROR** when no valid *foo* can be extracted. The *Tcl_Interp* is provided to receive standardized error messages and codes on failure. The final output argument is a pointer to space where the extracted *foo* value may be written. The value written is now the possession of the caller, generated by the routine, possible by making a copy of something stored in the *Tcl_Obj*. This pattern is followed in the cases where **Foo** is **Boolean**, **Bignum**, **Double**, | | > > | | 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | The _int_ return allows for returning **TCL_ERROR** when no valid *foo* can be extracted. The *Tcl_Interp* is provided to receive standardized error messages and codes on failure. The final output argument is a pointer to space where the extracted *foo* value may be written. The value written is now the possession of the caller, generated by the routine, possible by making a copy of something stored in the *Tcl_Obj*. This pattern is followed in the cases where **Foo** is **Boolean**, **Bignum**, **Double**, **Encoding**, **Index**, **Int**, **Long**, or **WideInt**. (See also private routines where **Foo** is **Channel**, **Number**, **CompletionCode**, **WideBits**, or **Namespace**.) The second pattern, > _foo_ **Tcl_GetFooFromObj**(*Tcl_Interp* _*_, *Tcl_Obj* _*_, ...); is used when the *foo* representation is a token that can take on the value **NULL**. In this case, a return of **NULL** by the routine signals the circumstance that no valid *foo* can be extracted. As in the first pattern, a *Tcl_Interp* is provided to receive standardized error messages and codes on failure. Documentation must be consulted to determine any constraints on the use of the returned token value by the caller. It is likely to rely on information stored within the internal structures of Tcl, which may need management with reference counting, memory preservation, and/or maintaining a claim on the original *Tcl_Obj*. This pattern is followed in the cases where **Foo** is **RegExp** or **Command**. (See also private routine where **Foo** is **Lambda**.) The final pattern, > _foo_ **Tcl_GetFooFromObj**(*Tcl_Obj* _*_, _int *_); is used when the returned *foo* is a pointer value pointing into an array stored within Tcl's own structures. It is implicit in the |
︙ | ︙ | |||
79 80 81 82 83 84 85 | so far as they produce a *foo* value that caller can use in the place of operating directly on the *Tcl_Obj*. Experience has proven that the existing specification of the routine **Tcl_GetByteArrayFromObj** fails that test. # History and Rationale | | | | | > > > > | | | | | | | | | > > > > > > > > > > > > > > > > > | 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | so far as they produce a *foo* value that caller can use in the place of operating directly on the *Tcl_Obj*. Experience has proven that the existing specification of the routine **Tcl_GetByteArrayFromObj** fails that test. # History and Rationale Starting with release 8.1, Tcl string values were composed of characters from an international character set. Each string was conceived as a sequence of characters from UCS-2. Each routine accepting a string as a *char* array, and the string representation of each *Tcl_Obj* expected to store a UCS-2 sequence in a Modified UTF-8 encoding. (We are in progress extending the Tcl character set from UCS-2 to all of Unicode, but that will not change the important matters here. For reliable binary transfer, we can no longer simply write arbitrary bytes to a string representation, and Tcl strings in general contain characters outisde the byte range.) This change created the need for a new mechanism to accept, transmit, store, and produce arbitrary binary values, preferably while minimizing the need to convert to other representations. The _bytearray_ **Tcl_ObjType** was created to address this need. The routine **Tcl_NewByteArrayObj** stores an arbitrary byte sequence in a *Tcl_Obj*. The routine **Tcl_GetByteArrayFromObj** can then retrieve that same sequence. When the string representation of the value is needed each byte (with value from 0-255) in the sequence is treated as the corresponding UCS-2 codepoint (U+0000 - U+00FF), and that UCS-2 sequence is encoded in Modified UTF-8 in the usual way. This strategy permits all byte sequences to be encoded in a subset of Tcl string values. When **Tcl_GetByteArrayFromObj** is called on a value where no byte sequence has been stored, one is generated from the string representation. When the string is one from the subset produced by encoding byte sequences, the decoding is clear, but what about other string values, those that contain at least one codepoint greater than U+00FF? It was decided that any larger codepoint in the string value would have its high bits stripped away, and be decoded based on the low 8 bits it contained. Given this decision, all strings produce a byte sequence, and **Tcl_GetByteArrayFromObj** would always return a result. It did not need to provide for raising errors. This decision is the source of all the trouble. When a caller of **Tcl_GetByteArrayFromObj** receives access to a byte sequence, it does not know whether this is a sequence originally stored, or one generated by transforming and possibly truncating characters from a general Tcl string value. This means the contents of the byte sequence do not reliably reveal much about the value. We could not say, for example, what the 3rd character in the value is. At best we could say what is left when all high bits are stripped off that character. It is not a common need to treat all string values according to equivalence classes set by examining only the low-bytes of every character. If we supplement the call to **Tcl_GetByteArrayFromObj** with a call to **Tcl_HasStringRep**, we might learn that the value does not have a string representation stored within it. In that case, we have what we have come to call a _pure_ bytearray value, and we could use the byte sequence as the definitive value. Note however that this is fragile. If anything causes the string representation to be generated, we lose this supplementary test, and we are back to being unable to use the byte sequence at all. Testing for pure bytearrays can help, but it cannot solve the total problem. Bug examples. # Specification # Compatibility # Scope |
︙ | ︙ |