Tcl Source Code

Check-in [dc02be5f4d]
Login
Bounty program for improvements to Tcl and certain Tcl packages.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:WIP
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | dgp-review
Files: files | file ages | folders
SHA3-256: dc02be5f4d6103e4dc3c84991b3ed643a5cf5fdebc53596b6539aacadcc2d8e2
User & Date: dgp 2020-02-12 17:10:57
Context
2020-02-12
17:13
WIP check-in: 3a7d78ac06 user: dgp tags: dgp-review
17:10
WIP check-in: dc02be5f4d user: dgp tags: dgp-review
16:47
edits check-in: a927119ed1 user: dgp tags: dgp-review
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/dev/value-history.md.

649
650
651
652
653
654
655
























656
657
658
659
660
661
662
far smaller and more constrained that the general set of all byte sequences.
When we claim that Tcl 8.1 strings are kept in the UTF-8 encoding, we
imply that Tcl 8.1 strings are constrained to a much smaller set of byte
sequences than were permitted for Tcl 8.0 strings.  This raises questions
about both compatibility and what a decoder should do with a non-conformant
byte sequence.





























decoding and strictness

UTF-16 and surrogate pairs






>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
far smaller and more constrained that the general set of all byte sequences.
When we claim that Tcl 8.1 strings are kept in the UTF-8 encoding, we
imply that Tcl 8.1 strings are constrained to a much smaller set of byte
sequences than were permitted for Tcl 8.0 strings.  This raises questions
about both compatibility and what a decoder should do with a non-conformant
byte sequence.

It is an expected feature of the Unicode character set that more characters
are to be added over time.  The number of assigned codepoints grows as new
versions of the Unicode standard are codified and published.  Because
support for standards always lags their publication, software written in
conformance to one version of Unicode is likely to encounter data produced
in conformance to a later version.  In light of this, the best practice is
to accommodate and preserve unassigned codepoints as much as possible.
Software written to support Unicode 1.1 can then accept Unicode 2 data streams,pass them through and output them again undamaged. In this way a middleware
written to an obsolete Unicode standard can still support providers and
clients that seek to use characters assigned only in a later standard.
Unicode 1.1 left open the possibility that any codepoint in UCS-2 might
one day be assigned. Tcl 8.1 imposes no conditions on the encoding of any
**Tcl_UniChar** value at all.

The standards specifying text encodings publish in the mid-1990s were quite
clear and explicit about the right way to do things. They were often less
demanding and specific about how to respond in the presence of errors. The
spirit of Postel's Robustness Principle,

>	*Be liberal in what you accept, and conservative in what you send.*,

held considerable influence at the time. Many implementations chose to
accommodate input errors, especially when that was the natural results
of laziness.




decoding and strictness

UTF-16 and surrogate pairs