Tcl Source Code

Check-in [3a7d78ac06]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:WIP
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | dgp-review
Files: files | file ages | folders
SHA3-256: 3a7d78ac06657da58622d198ef29f809bef0382ba72eaf335bf7eb7e13fd2440
User & Date: dgp 2020-02-12 17:13:29.221
Context
2020-02-12
17:54
WIP check-in: 28ec05e046 user: dgp tags: dgp-review
17:13
WIP check-in: 3a7d78ac06 user: dgp tags: dgp-review
17:10
WIP check-in: dc02be5f4d user: dgp tags: dgp-review
Changes
Unified Diff Ignore Whitespace Patch
Changes to doc/dev/value-history.md.
666
667
668
669
670
671
672
673
674
675
676
677
678
679



680
681
682
683
684
685
686
Unicode 1.1 left open the possibility that any codepoint in UCS-2 might
one day be assigned. Tcl 8.1 imposes no conditions on the encoding of any
**Tcl_UniChar** value at all.

The standards specifying text encodings publish in the mid-1990s were quite
clear and explicit about the right way to do things. They were often less
demanding and specific about how to respond in the presence of errors. The
spirit of Postel's Robustness Principle,

>	*Be liberal in what you accept, and conservative in what you send.*,

held considerable influence at the time. Many implementations chose to
accommodate input errors, especially when that was the natural results
of laziness.







decoding and strictness

UTF-16 and surrogate pairs







|

|


|
|
>
>
>







666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
Unicode 1.1 left open the possibility that any codepoint in UCS-2 might
one day be assigned. Tcl 8.1 imposes no conditions on the encoding of any
**Tcl_UniChar** value at all.

The standards specifying text encodings publish in the mid-1990s were quite
clear and explicit about the right way to do things. They were often less
demanding and specific about how to respond in the presence of errors. The
spirit of Postel's Robustness Principle

>	*Be liberal in what you accept, and conservative in what you send.*

held considerable influence at the time. Many implementations chose to
accommodate input errors, especially when that was the natural result
of laziness. For example, the specification of FSS-UTF and all specifications
for UTF-8 are completely clear and explicit that the proper encoding
of the codepoint **U+0000** is the byte value **0x00**, (our old friend
the **NUL** byte!).




decoding and strictness

UTF-16 and surrogate pairs