Tcl Source Code

Check-in [7baffdc778]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:WIP
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | dgp-review
Files: files | file ages | folders
SHA3-256: 7baffdc7784444738309789f9a9e131fe0e361272abfe2cb9e20281cd393bde9
User & Date: dgp 2020-02-12 18:16:34.257
Context
2020-02-17
20:04
WIP check-in: 9e44e890fa user: dgp tags: dgp-review
2020-02-12
18:16
WIP check-in: 7baffdc778 user: dgp tags: dgp-review
17:54
WIP check-in: 28ec05e046 user: dgp tags: dgp-review
Changes
Unified Diff Ignore Whitespace Patch
Changes to doc/dev/value-history.md.
693
694
695
696
697
698
699



700



701
702
703
704
705
706
707
exactly that. The text of Appendix A.2 of The Unicode Standard 2.0
(which contains one definition of UTF-8 from 1996) explicitly states,

>	When converting from UTF-8 to Unicode values, however,
>	implementations do not need to check that the shortest encoding
>	is being used, which simplifies the conversion algorithm.












UTF-16 and surrogate pairs

Compat with 8.0







>
>
>

>
>
>







693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
exactly that. The text of Appendix A.2 of The Unicode Standard 2.0
(which contains one definition of UTF-8 from 1996) explicitly states,

>	When converting from UTF-8 to Unicode values, however,
>	implementations do not need to check that the shortest encoding
>	is being used, which simplifies the conversion algorithm.

Whether or not the Tcl developers were explicitly following this advice,
the routine in Tcl 8.1 for decoding Unicode codepoints from the
encoded byte sequences,

>	**int** **Tcl_UtfToUniChar** ( **const char** *_str_, **Tcl_UniChar** *_chPtr_),

implements the same forgiving support for overlong byte sequences.




UTF-16 and surrogate pairs

Compat with 8.0