Check-in [2fc06661eb]

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Split TIP #542 into 2 separate TIP's.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 2fc06661eb9d9f392c7299a5f45ad246cd4a96c2cf72c1a40c7cc2e529d924c4
User & Date: jan.nijtmans 2019-05-31 13:52:39
Context
2019-05-31
13:53
TIP's 537 and 544 are final now. check-in: 558b3ef217 user: jan.nijtmans tags: trunk
13:52
Split TIP #542 into 2 separate TIP's. check-in: 2fc06661eb user: jan.nijtmans tags: trunk
2019-05-27
15:22
Withdrawn 292 at request of author check-in: 84194b42a8 user: dkf tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to tip/542.md.

1
2
3
4
5
6
7
8
..
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
..
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
...
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# TIP 542: New encodings: UTF-16, UCS-2, Support for switchable (Androwish-compatible) Full Unicode support. 
	Author:         Jan Nijtmans <[email protected]>
	Author:         Jan Nijtmans <[email protected]>
	State:          Draft
	Type:           Project
	Vote:           Pending
	Created:        10-May-2019
	Post-History:   
................................................................................
	Keywords:       Tcl
	Tcl-Version:    8.7
	Tcl-Branch:     utf-max
-----

# Abstract

This TIP proposes to add more encodings, and being able to switch Tcl
between Full Unicode mode (TCL\_UTF\_MAX>3, almost compatible with Androwish) and
current partial Unicode mode (as far as TIP #389 goes, using TCL\_UTF\_MAX=3) 

# Rationale

Tcl currently can be compiled in 3 different modes: using TCL\_UTF\_MAX=3, TCL\_UTF\_MAX=4
or TCL\_UTF\_MAX=6. The first 2 are actually equal now in Tcl 8.7 (since TIP #389). Using
TCL\_UTF\_MAX=6 is actually overkill, since no utf-8 character consists of more than 4 bytes.

................................................................................
The default compilation mode for Tcl will continue to be TCL\_UTF\_MAX=3, which is 100%
upwards compatible with Tcl 8.6.

# Specification

This document proposes:

 * Add new encodings "utf-16", "utf-16le", "utf-16be", "ucs-2", "ucs-2le", "ucs-2be".

 * Allow Tcl to be compiled with either -DTCL\_UTF\_MAX=3 (default), or with -DTCL\_UTF\_MAX=4.
   In the latter mode, the Tcl_UniChar type becomes a 32-bit type, but the stub entries for
   the 16-bit Tcl\_UniChar type are present as well. So, most extensions compiled with -DTCL\_UTF\_MAX=3
   will continue to work in either Tcl mode (for caveats, see below).

 * Allow Tcl extensions to be compiled with either -DTCL\_UTF\_MAX=3 (default), or with -DTCL\_UTF\_MAX=4,
   when Tcl is compiled with -DTCL\_UTF\_MAX=4.

 * Deprecate the "unicode" encoding. "utf-16" is supposed to be used in stead. The "unicode" encoding
   will NOT be removed in Tcl 9.0, since it's too common.

 * Enhance the Tcl\_UniCharToUtfDString() function such that the uniLength parameter is allowed to
   have the value -1.
 
 * Deprecate the following functions:

     Tcl\_AppendUnicodeToObj()

................................................................................
     but still want to use the 16-bit conversions independent on the TCL\_UTF\_MAX setting or Tcl\_UniChar type.

# Compatibility

As long as Tcl is compiled with -DTCL\_UTF\_MAX=3, this is fully upwards compatible.

When Tcl is compiled with -DTCL\_UTF\_MAX=4, this is at the Tcl level, compatible with the Androwish-version
of Tcl with one exception: In Androwish the "unicode" encoding is 32-bit, in Tcl it continues
to be 16-bit, an alias for "utf-16". At the C-API level, it's upwards compatible with Tcl 8.6 in
TCL\_UTF\_MAX=6 mode, except for the functions marked above as deprecated. Those functions will be gone.

# Caveats

 * Extensions compiled with -DTCL\_UTF\_MAX=4 cannot use any of the deprecated functions mentioned in this TIP.
   Using any of them results in a link error.

 * If Tcl is compiled with -DTCL\_UTF\_MAX=4, the deprecated functions will be gone. Any
|







 







|
|
|







 







<
<








<
<
<







 







<
|
|







1
2
3
4
5
6
7
8
..
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
..
43
44
45
46
47
48
49


50
51
52
53
54
55
56
57



58
59
60
61
62
63
64
..
95
96
97
98
99
100
101

102
103
104
105
106
107
108
109
110
# TIP 542:Support for switchable (Androwish-compatible) Full Unicode support. 
	Author:         Jan Nijtmans <[email protected]>
	Author:         Jan Nijtmans <[email protected]>
	State:          Draft
	Type:           Project
	Vote:           Pending
	Created:        10-May-2019
	Post-History:   
................................................................................
	Keywords:       Tcl
	Tcl-Version:    8.7
	Tcl-Branch:     utf-max
-----

# Abstract

This TIP proposes being able to switch Tcl between Full Unicode mode
(TCL\_UTF\_MAX>3, almost compatible with Androwish) and current partial
Unicode mode (as far as TIP #389 goes, using TCL\_UTF\_MAX=3) 

# Rationale

Tcl currently can be compiled in 3 different modes: using TCL\_UTF\_MAX=3, TCL\_UTF\_MAX=4
or TCL\_UTF\_MAX=6. The first 2 are actually equal now in Tcl 8.7 (since TIP #389). Using
TCL\_UTF\_MAX=6 is actually overkill, since no utf-8 character consists of more than 4 bytes.

................................................................................
The default compilation mode for Tcl will continue to be TCL\_UTF\_MAX=3, which is 100%
upwards compatible with Tcl 8.6.

# Specification

This document proposes:



 * Allow Tcl to be compiled with either -DTCL\_UTF\_MAX=3 (default), or with -DTCL\_UTF\_MAX=4.
   In the latter mode, the Tcl_UniChar type becomes a 32-bit type, but the stub entries for
   the 16-bit Tcl\_UniChar type are present as well. So, most extensions compiled with -DTCL\_UTF\_MAX=3
   will continue to work in either Tcl mode (for caveats, see below).

 * Allow Tcl extensions to be compiled with either -DTCL\_UTF\_MAX=3 (default), or with -DTCL\_UTF\_MAX=4,
   when Tcl is compiled with -DTCL\_UTF\_MAX=4.




 * Enhance the Tcl\_UniCharToUtfDString() function such that the uniLength parameter is allowed to
   have the value -1.
 
 * Deprecate the following functions:

     Tcl\_AppendUnicodeToObj()

................................................................................
     but still want to use the 16-bit conversions independent on the TCL\_UTF\_MAX setting or Tcl\_UniChar type.

# Compatibility

As long as Tcl is compiled with -DTCL\_UTF\_MAX=3, this is fully upwards compatible.

When Tcl is compiled with -DTCL\_UTF\_MAX=4, this is at the Tcl level, compatible with the Androwish-version

of Tcl. At the C-API level, it's upwards compatible with Tcl 8.6 in TCL\_UTF\_MAX=6 mode, except for the
functions marked above as deprecated. Those functions will be gone.

# Caveats

 * Extensions compiled with -DTCL\_UTF\_MAX=4 cannot use any of the deprecated functions mentioned in this TIP.
   Using any of them results in a link error.

 * If Tcl is compiled with -DTCL\_UTF\_MAX=4, the deprecated functions will be gone. Any

Added tip/547.md.














































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# TIP 547: New encodings: UTF-16, UCS-2.
	Author:         Jan Nijtmans <[email protected]sers.sf.net>
	Author:         Jan Nijtmans <[email protected]>
	State:          Draft
	Type:           Project
	Vote:           Pending
	Created:        31-May-2019
	Post-History:   
	Discussions-To: Tcl Core list
	Keywords:       Tcl
	Tcl-Version:    8.7
	Tcl-Branch:     tip-547
-----

# Abstract

This TIP proposes to add more encodings for handling utf-16 and ucs-2.

# Rationale

Currently, Tcl only has one multi-byte Utf encoding named "unicode". Depending on how
Tcl is compiled, this could be 16-bit or 32-bit. If 16-bit, then it's currently
not clear whether surrogates are handled or not. Also, those encodings always
use the platform-endian mode. There is no way to force little- or big-endianess.

Therefore this TIP proposes to clear up the ambiguity: Make clear that those
encodings are always 16-bit, and provide different encodings for little- and
big-endian. The "utf-16" variant handles surrogates while the "ucs-2" variant does not.

# Specification

This document proposes:

 * Add new encodings "utf-16", "utf-16le", "utf-16be", "ucs-2", "ucs-2le", "ucs-2be".

 * Deprecate the "unicode" encoding. "utf-16" is supposed to be used in stead. The "unicode" encoding
   will NOT be removed in Tcl 9.0, since it's too common.

# Use case

Tk defines it's own "ucs-2be" encoding when compiled on little-endian machines. So, this TIP means
that Tk no longer needs to provide this encoding any more.

# Compatibility

This is fully upwards compatible, except when Tcl is compiled with -DTCL_UTF_MAX=6 (which is - actually - not supported).

# Reference Implementation

A reference implementation is available in  the **tip-547** branch.
<https://core.tcl.tk/tcl/timeline?r=tip-547>

# Copyright

This document has been placed in the public domain.