Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | WIP |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | dgp-review |
Files: | files | file ages | folders |
SHA3-256: |
dc02be5f4d6103e4dc3c84991b3ed643 |
User & Date: | dgp 2020-02-12 17:10:57.100 |
Context
2020-02-12
| ||
17:13 | WIP check-in: 3a7d78ac06 user: dgp tags: dgp-review | |
17:10 | WIP check-in: dc02be5f4d user: dgp tags: dgp-review | |
16:47 | edits check-in: a927119ed1 user: dgp tags: dgp-review | |
Changes
Changes to doc/dev/value-history.md.
︙ | ︙ | |||
649 650 651 652 653 654 655 656 657 658 659 660 661 662 | far smaller and more constrained that the general set of all byte sequences. When we claim that Tcl 8.1 strings are kept in the UTF-8 encoding, we imply that Tcl 8.1 strings are constrained to a much smaller set of byte sequences than were permitted for Tcl 8.0 strings. This raises questions about both compatibility and what a decoder should do with a non-conformant byte sequence. decoding and strictness UTF-16 and surrogate pairs | > > > > > > > > > > > > > > > > > > > > > > > > | 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 | far smaller and more constrained that the general set of all byte sequences. When we claim that Tcl 8.1 strings are kept in the UTF-8 encoding, we imply that Tcl 8.1 strings are constrained to a much smaller set of byte sequences than were permitted for Tcl 8.0 strings. This raises questions about both compatibility and what a decoder should do with a non-conformant byte sequence. It is an expected feature of the Unicode character set that more characters are to be added over time. The number of assigned codepoints grows as new versions of the Unicode standard are codified and published. Because support for standards always lags their publication, software written in conformance to one version of Unicode is likely to encounter data produced in conformance to a later version. In light of this, the best practice is to accommodate and preserve unassigned codepoints as much as possible. Software written to support Unicode 1.1 can then accept Unicode 2 data streams,pass them through and output them again undamaged. In this way a middleware written to an obsolete Unicode standard can still support providers and clients that seek to use characters assigned only in a later standard. Unicode 1.1 left open the possibility that any codepoint in UCS-2 might one day be assigned. Tcl 8.1 imposes no conditions on the encoding of any **Tcl_UniChar** value at all. The standards specifying text encodings publish in the mid-1990s were quite clear and explicit about the right way to do things. They were often less demanding and specific about how to respond in the presence of errors. The spirit of Postel's Robustness Principle, > *Be liberal in what you accept, and conservative in what you send.*, held considerable influence at the time. Many implementations chose to accommodate input errors, especially when that was the natural results of laziness. decoding and strictness UTF-16 and surrogate pairs |
︙ | ︙ |