Tcl Source Code

Check-in [1f393d7d01]
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Document that the *Backslash parsing functions output maximum 4 bytes, irrespectable of the TCL_UTF_MAX setting: It could be 4 for the "\Uxxxxxx" construct, but never more. Move <stddef.h> and <locale.h> to tclInt.h, so the can be removed from various other places.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | core-8-branch
Files: files | file ages | folders
SHA3-256: 1f393d7d018350c118f58021618faf99b0312e93b33803a1214fd8d036fc1215
User & Date: jan.nijtmans 2019-08-02 13:35:21
Context
2019-08-02
14:34
Merge 8.6 check-in: e52b4963c3 user: jan.nijtmans tags: core-8-branch
14:26
Merge 8.7 check-in: b0fbdeb265 user: jan.nijtmans tags: trunk
14:24
Merge core-8-branch. Use tclWinDde.c and tclWinReg.c to show how the *WChar* functions could be used... check-in: 3e89b867a2 user: jan.nijtmans tags: tip-548
13:35
Document that the *Backslash parsing functions output maximum 4 bytes, irrespectable of the TCL_UTF_... check-in: 1f393d7d01 user: jan.nijtmans tags: core-8-branch
12:47
Merge 8.6 check-in: b908a74678 user: jan.nijtmans tags: core-8-branch
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/Utf.3.

71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
...
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
...
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
...
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
.sp
int
\fBTcl_UtfBackslash\fR(\fIsrc, readPtr, dst\fR)
.SH ARGUMENTS
.AS "const Tcl_UniChar" *uniPattern in/out
.AP char *buf out
Buffer in which the UTF-8 representation of the Tcl_UniChar is stored.  At most
\fBTCL_UTF_MAX\fR bytes are stored in the buffer.
.AP int ch in
The Unicode character to be converted or examined.
.AP Tcl_UniChar *chPtr out
Filled with the Tcl_UniChar represented by the head of the UTF-8 string.
.AP "const char" *src in
Pointer to a UTF-8 string.
.AP "const char" *cs in
................................................................................
.AP int index in
The index of a character (not byte) in the UTF-8 string.
.AP int *readPtr out
If non-NULL, filled with the number of bytes in the backslash sequence,
including the backslash character.
.AP char *dst out
Buffer in which the bytes represented by the backslash sequence are stored.
At most \fBTCL_UTF_MAX\fR bytes are stored in the buffer.
.AP int nocase in
Specifies whether the match should be done case-sensitive (0) or
case-insensitive (1).
.BE

.SH DESCRIPTION
.PP
................................................................................
source buffer is long enough such that this routine does not run off the
end and dereference non-existent or random memory; if the source buffer
is known to be null-terminated, this will not happen.  If the input is
a byte in the range 0x80 - 0x9F, \fBTcl_UtfToUniChar\fR assumes the
cp1252 encoding, stores the corresponding Tcl_UniChar in \fI*chPtr\fR
and returns 1. If the input is otherwise
not in proper UTF-8 format, \fBTcl_UtfToUniChar\fR will store the first
byte of \fIsrc\fR in \fI*chPtr\fR as a Tcl_UniChar between 0x0000 and
0x00ff and return 1.
.PP
\fBTcl_UniCharToUtfDString\fR converts the given Unicode string
to UTF-8, storing the result in a previously initialized \fBTcl_DString\fR.
You must specify \fIuniLength\fR, the length of the given Unicode string.
The return value is a pointer to the UTF-8 representation of the
Unicode string.  Storage for the return value is appended to the
end of the \fBTcl_DString\fR.
................................................................................
contain at least \fIindex\fR characters.  This is equivalent to calling
\fBTcl_UtfNext\fR \fIindex\fR times.  If a negative \fIindex\fR is given,
the return pointer points to the first character in the source string.
.PP
\fBTcl_UtfBackslash\fR is a utility procedure used by several of the Tcl
commands.  It parses a backslash sequence and stores the properly formed
UTF-8 character represented by the backslash sequence in the output
buffer \fIdst\fR.  At most \fBTCL_UTF_MAX\fR bytes are stored in the buffer.
\fBTcl_UtfBackslash\fR modifies \fI*readPtr\fR to contain the number
of bytes in the backslash sequence, including the backslash character.
The return value is the number of bytes stored in the output buffer.
.PP
See the \fBTcl\fR manual entry for information on the valid backslash
sequences.  All of the sequences described in the Tcl manual entry are
supported by \fBTcl_UtfBackslash\fR.

.SH KEYWORDS
utf, unicode, backslash






|







 







|







 







|
|







 







|










71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
...
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
...
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
...
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
.sp
int
\fBTcl_UtfBackslash\fR(\fIsrc, readPtr, dst\fR)
.SH ARGUMENTS
.AS "const Tcl_UniChar" *uniPattern in/out
.AP char *buf out
Buffer in which the UTF-8 representation of the Tcl_UniChar is stored.  At most
4 bytes are stored in the buffer.
.AP int ch in
The Unicode character to be converted or examined.
.AP Tcl_UniChar *chPtr out
Filled with the Tcl_UniChar represented by the head of the UTF-8 string.
.AP "const char" *src in
Pointer to a UTF-8 string.
.AP "const char" *cs in
................................................................................
.AP int index in
The index of a character (not byte) in the UTF-8 string.
.AP int *readPtr out
If non-NULL, filled with the number of bytes in the backslash sequence,
including the backslash character.
.AP char *dst out
Buffer in which the bytes represented by the backslash sequence are stored.
At most 4 bytes are stored in the buffer.
.AP int nocase in
Specifies whether the match should be done case-sensitive (0) or
case-insensitive (1).
.BE

.SH DESCRIPTION
.PP
................................................................................
source buffer is long enough such that this routine does not run off the
end and dereference non-existent or random memory; if the source buffer
is known to be null-terminated, this will not happen.  If the input is
a byte in the range 0x80 - 0x9F, \fBTcl_UtfToUniChar\fR assumes the
cp1252 encoding, stores the corresponding Tcl_UniChar in \fI*chPtr\fR
and returns 1. If the input is otherwise
not in proper UTF-8 format, \fBTcl_UtfToUniChar\fR will store the first
byte of \fIsrc\fR in \fI*chPtr\fR as a Tcl_UniChar between 0x00A0 and
0x00FF and return 1.
.PP
\fBTcl_UniCharToUtfDString\fR converts the given Unicode string
to UTF-8, storing the result in a previously initialized \fBTcl_DString\fR.
You must specify \fIuniLength\fR, the length of the given Unicode string.
The return value is a pointer to the UTF-8 representation of the
Unicode string.  Storage for the return value is appended to the
end of the \fBTcl_DString\fR.
................................................................................
contain at least \fIindex\fR characters.  This is equivalent to calling
\fBTcl_UtfNext\fR \fIindex\fR times.  If a negative \fIindex\fR is given,
the return pointer points to the first character in the source string.
.PP
\fBTcl_UtfBackslash\fR is a utility procedure used by several of the Tcl
commands.  It parses a backslash sequence and stores the properly formed
UTF-8 character represented by the backslash sequence in the output
buffer \fIdst\fR.  At most 4 bytes are stored in the buffer.
\fBTcl_UtfBackslash\fR modifies \fI*readPtr\fR to contain the number
of bytes in the backslash sequence, including the backslash character.
The return value is the number of bytes stored in the output buffer.
.PP
See the \fBTcl\fR manual entry for information on the valid backslash
sequences.  All of the sequences described in the Tcl manual entry are
supported by \fBTcl_UtfBackslash\fR.

.SH KEYWORDS
utf, unicode, backslash

Changes to generic/tclCmdAH.c.

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 * this file, and for a DISCLAIMER OF ALL WARRANTIES.
 */

#include "tclInt.h"
#ifdef _WIN32
#   include "tclWinInt.h"
#endif
#include <locale.h>

/*
 * The state structure used by [foreach]. Note that the actual structure has
 * all its working arrays appended afterwards so they can be allocated and
 * freed in a single step.
 */







<







11
12
13
14
15
16
17

18
19
20
21
22
23
24
 * this file, and for a DISCLAIMER OF ALL WARRANTIES.
 */

#include "tclInt.h"
#ifdef _WIN32
#   include "tclWinInt.h"
#endif


/*
 * The state structure used by [foreach]. Note that the actual structure has
 * all its working arrays appended afterwards so they can be allocated and
 * freed in a single step.
 */

Changes to generic/tclInt.h.

74
75
76
77
78
79
80
81
82
83
84
85
86


87
88
89
90
91
92
93
#   include <stdlib.h>
#endif
#ifdef NO_STRING_H
#include "../compat/string.h"
#else
#include <string.h>
#endif
#if defined(STDC_HEADERS) || defined(__STDC__) || defined(__C99__FUNC__) \
     || defined(__cplusplus) || defined(_MSC_VER) || defined(__ICC)
#include <stddef.h>
#else
typedef int ptrdiff_t;
#endif



/*
 * Ensure WORDS_BIGENDIAN is defined correctly:
 * Needs to happen here in addition to configure to work with fat compiles on
 * Darwin (where configure runs only once for multiple architectures).
 */







|
|
<
<


>
>







74
75
76
77
78
79
80
81
82


83
84
85
86
87
88
89
90
91
92
93
#   include <stdlib.h>
#endif
#ifdef NO_STRING_H
#include "../compat/string.h"
#else
#include <string.h>
#endif
#if !defined(STDC_HEADERS) && !defined(__STDC__) && !defined(__C99__FUNC__) \
     && !defined(__cplusplus) && !defined(_MSC_VER) && !defined(__ICC)


typedef int ptrdiff_t;
#endif
#include <stddef.h>
#include <locale.h>

/*
 * Ensure WORDS_BIGENDIAN is defined correctly:
 * Needs to happen here in addition to configure to work with fat compiles on
 * Darwin (where configure runs only once for multiple architectures).
 */

Changes to generic/tclParse.c.

780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
    const char *src,		/* Points to the backslash character of a a
				 * backslash sequence. */
    int numBytes,		/* Max number of bytes to scan. */
    int *readPtr,		/* NULL, or points to storage where the number
				 * of bytes scanned should be written. */
    char *dst)			/* NULL, or points to buffer where the UTF-8
				 * encoding of the backslash sequence is to be
				 * written. At most TCL_UTF_MAX bytes will be
				 * written there. */
{
    register const char *p = src+1;
    Tcl_UniChar unichar = 0;
    int result;
    int count;
    char buf[4] = "";







|
<







780
781
782
783
784
785
786
787

788
789
790
791
792
793
794
    const char *src,		/* Points to the backslash character of a a
				 * backslash sequence. */
    int numBytes,		/* Max number of bytes to scan. */
    int *readPtr,		/* NULL, or points to storage where the number
				 * of bytes scanned should be written. */
    char *dst)			/* NULL, or points to buffer where the UTF-8
				 * encoding of the backslash sequence is to be
				 * written. At most 4 bytes will be written there. */

{
    register const char *p = src+1;
    Tcl_UniChar unichar = 0;
    int result;
    int count;
    char buf[4] = "";

Changes to generic/tclTest.c.

24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <math.h>

/*
 * Required for Testregexp*Cmd
 */
#include "tclRegexp.h"

/*
 * Required for TestlocaleCmd
 */
#include <locale.h>

/*
 * Required for the TestChannelCmd and TestChannelEventCmd
 */
#include "tclIO.h"

/*
 * Declare external functions used in Windows tests.






<
<
<
<
<







24
25
26
27
28
29
30





31
32
33
34
35
36
37
#include <math.h>

/*
 * Required for Testregexp*Cmd
 */
#include "tclRegexp.h"






/*
 * Required for the TestChannelCmd and TestChannelEventCmd
 */
#include "tclIO.h"

/*
 * Declare external functions used in Windows tests.

Changes to generic/tclUtf.c.

1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
 *
 * Tcl_UtfBackslash --
 *
 *	Figure out how to handle a backslash sequence.
 *
 * Results:
 *	Stores the bytes represented by the backslash sequence in dst and
 *	returns the number of bytes written to dst. At most TCL_UTF_MAX bytes
 *	are written to dst; dst must have been large enough to accept those
 *	bytes. If readPtr isn't NULL then it is filled in with a count of the
 *	number of bytes in the backslash sequence.
 *
 * Side effects:
 *	The maximum number of bytes it takes to represent a Unicode character
 *	in UTF-8 is guaranteed to be less than the number of bytes used to






|







1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
 *
 * Tcl_UtfBackslash --
 *
 *	Figure out how to handle a backslash sequence.
 *
 * Results:
 *	Stores the bytes represented by the backslash sequence in dst and
 *	returns the number of bytes written to dst. At most 4 bytes
 *	are written to dst; dst must have been large enough to accept those
 *	bytes. If readPtr isn't NULL then it is filled in with a count of the
 *	number of bytes in the backslash sequence.
 *
 * Side effects:
 *	The maximum number of bytes it takes to represent a Unicode character
 *	in UTF-8 is guaranteed to be less than the number of bytes used to

Changes to unix/tclUnixInit.c.

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
 *
 * Copyright (c) 1995-1997 Sun Microsystems, Inc.
 * Copyright (c) 1999 by Scriptics Corporation.
 * All rights reserved.
 */

#include "tclInt.h"
#include <stddef.h>
#include <locale.h>
#ifdef HAVE_LANGINFO
#   include <langinfo.h>
#   ifdef __APPLE__
#	if defined(HAVE_WEAK_IMPORT) && MAC_OS_X_VERSION_MIN_REQUIRED < 1030
	    /* Support for weakly importing nl_langinfo on Darwin. */
#	    define WEAK_IMPORT_NL_LANGINFO
	    extern char *nl_langinfo(nl_item) WEAK_IMPORT_ATTRIBUTE;






<
<







5
6
7
8
9
10
11


12
13
14
15
16
17
18
 *
 * Copyright (c) 1995-1997 Sun Microsystems, Inc.
 * Copyright (c) 1999 by Scriptics Corporation.
 * All rights reserved.
 */

#include "tclInt.h"


#ifdef HAVE_LANGINFO
#   include <langinfo.h>
#   ifdef __APPLE__
#	if defined(HAVE_WEAK_IMPORT) && MAC_OS_X_VERSION_MIN_REQUIRED < 1030
	    /* Support for weakly importing nl_langinfo on Darwin. */
#	    define WEAK_IMPORT_NL_LANGINFO
	    extern char *nl_langinfo(nl_item) WEAK_IMPORT_ATTRIBUTE;

Changes to unix/tclUnixTime.c.

7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 * Copyright (c) 1995 Sun Microsystems, Inc.
 *
 * See the file "license.terms" for information on usage and redistribution of
 * this file, and for a DISCLAIMER OF ALL WARRANTIES.
 */

#include "tclInt.h"
#include <locale.h>
#if defined(TCL_WIDE_CLICKS) && defined(MAC_OSX_TCL)
#include <mach/mach_time.h>
#endif

/*
 * TclpGetDate is coded to return a pointer to a 'struct tm'. For thread
 * safety, this structure must be in thread-specific data. The 'tmKey'






<







7
8
9
10
11
12
13

14
15
16
17
18
19
20
 * Copyright (c) 1995 Sun Microsystems, Inc.
 *
 * See the file "license.terms" for information on usage and redistribution of
 * this file, and for a DISCLAIMER OF ALL WARRANTIES.
 */

#include "tclInt.h"

#if defined(TCL_WIDE_CLICKS) && defined(MAC_OSX_TCL)
#include <mach/mach_time.h>
#endif

/*
 * TclpGetDate is coded to return a pointer to a 'struct tm'. For thread
 * safety, this structure must be in thread-specific data. The 'tmKey'