Tcl Source Code

Check-in [174db4e4cd]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:More/better/cleaner handling of the bytearray special casing for string ops.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 174db4e4cde9fd85a57a9b07edbfc55af92b74f5
User & Date: dkf 2009-02-05 11:57:25
Context
2009-02-05
14:21
Fix [Bug 2568434] check-in: f783e3b9ff user: dkf tags: trunk
11:57
More/better/cleaner handling of the bytearray special casing for string ops. check-in: 174db4e4cd user: dkf tags: trunk
01:21
Improve efficiency of Tcl_AppendObjToObj's bytearray handling. check-in: 94be40db3a user: dkf tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to ChangeLog.
1
2
3
4
5
6


7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
2009-02-05  Donal K. Fellows  <[email protected]>

	* generic/tclStringObj.c (Tcl_AppendObjToObj): Special-case the
	appending of one bytearray to another, which can be extremely rapid.
	Part of scheme to address [Bug 1665628] by making the basic string
	operations more efficient on byte arrays.



2009-02-04  Don Porter  <[email protected]>

	* generic/tclStringObj.c: Added overflow protections to the
	AppendUtfToUtfRep routine to either avoid invalid arguments and
	crashes, or to replace them with controlled panics.  [Bug 2561794]

	* generic/tclCmdMZ.c:	Prevent crashes due to int overflow of the
	length of the result of [string repeat].  [Bug 2561746]

2009-02-03  Jan Nijtmans  <[email protected]>

	* macosx/tclMacOSXFCmd.c - eliminate some unnessary type casts
	* unix/tclLoadDyld.c     - some internal const decorations
	* unix/tclUnixCompat.c   - spacing
	* unix/tclUnixFCmd.c
	* unix/tclUnixFile.c
	* win/tclWinDde.c
	* win/tclWinFCmd.c
	* win/tclWinInit.c
	* win/tclWinLoad.c
	* win/tclWinPipe.c






>
>












|
|
|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2009-02-05  Donal K. Fellows  <[email protected]>

	* generic/tclStringObj.c (Tcl_AppendObjToObj): Special-case the
	appending of one bytearray to another, which can be extremely rapid.
	Part of scheme to address [Bug 1665628] by making the basic string
	operations more efficient on byte arrays.
	(Tcl_GetCharLength, Tcl_GetUniChar, Tcl_GetRange): More special casing
	work for bytearrays.

2009-02-04  Don Porter  <[email protected]>

	* generic/tclStringObj.c: Added overflow protections to the
	AppendUtfToUtfRep routine to either avoid invalid arguments and
	crashes, or to replace them with controlled panics.  [Bug 2561794]

	* generic/tclCmdMZ.c:	Prevent crashes due to int overflow of the
	length of the result of [string repeat].  [Bug 2561746]

2009-02-03  Jan Nijtmans  <[email protected]>

	* macosx/tclMacOSXFCmd.c: Eliminate some unnessary type casts
	* unix/tclLoadDyld.c:	  some internal const decorations
	* unix/tclUnixCompat.c:	  spacing
	* unix/tclUnixFCmd.c
	* unix/tclUnixFile.c
	* win/tclWinDde.c
	* win/tclWinFCmd.c
	* win/tclWinInit.c
	* win/tclWinLoad.c
	* win/tclWinPipe.c
Changes to generic/tclStringObj.c.
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
 *
 * Copyright (c) 1995-1997 Sun Microsystems, Inc.
 * Copyright (c) 1999 by Scriptics Corporation.
 *
 * See the file "license.terms" for information on usage and redistribution of
 * this file, and for a DISCLAIMER OF ALL WARRANTIES.
 *
 * RCS: @(#) $Id: tclStringObj.c,v 1.85 2009/02/05 01:21:59 dkf Exp $ */

#include "tclInt.h"
#include "tommath.h"

/*
 * Prototypes for functions defined later in this file:
 */







|







29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
 *
 * Copyright (c) 1995-1997 Sun Microsystems, Inc.
 * Copyright (c) 1999 by Scriptics Corporation.
 *
 * See the file "license.terms" for information on usage and redistribution of
 * this file, and for a DISCLAIMER OF ALL WARRANTIES.
 *
 * RCS: @(#) $Id: tclStringObj.c,v 1.86 2009/02/05 11:57:26 dkf Exp $ */

#include "tclInt.h"
#include "tommath.h"

/*
 * Prototypes for functions defined later in this file:
 */
130
131
132
133
134
135
136













137
138
139
140
141
142
143
	    : attemptckrealloc((char *) ptr, (unsigned) STRING_SIZE( \
		(numBytes) ? (numBytes) : sizeof(Tcl_UniChar)) ))
#define GET_STRING(objPtr) \
	((String *) (objPtr)->internalRep.otherValuePtr)
#define SET_STRING(objPtr, stringPtr) \
	((objPtr)->internalRep.otherValuePtr = (void *) (stringPtr))














/*
 * TCL STRING GROWTH ALGORITHM
 *
 * When growing strings (during an append, for example), the following growth
 * algorithm is used:
 *
 *   Attempt to allocate 2 * (originalLength + appendLength)







>
>
>
>
>
>
>
>
>
>
>
>
>







130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
	    : attemptckrealloc((char *) ptr, (unsigned) STRING_SIZE( \
		(numBytes) ? (numBytes) : sizeof(Tcl_UniChar)) ))
#define GET_STRING(objPtr) \
	((String *) (objPtr)->internalRep.otherValuePtr)
#define SET_STRING(objPtr, stringPtr) \
	((objPtr)->internalRep.otherValuePtr = (void *) (stringPtr))

/*
 * Macro that encapsulates the logic that determines when it is safe to
 * interpret a string as a byte array directly. In summary, the object must be
 * a byte array and must not have a string representation (as the operations
 * that it is used in are defined on strings, not byte arrays). Theoretically
 * it is possible to also be efficient in the case where the object's bytes
 * field is filled by generation from the byte array (c.f. list canonicality)
 * but we don't do that at the moment since this is purely about efficiency.
 */

#define IS_PURE_BYTE_ARRAY(objPtr) \
	(((objPtr)->typePtr==&tclByteArrayType) && ((objPtr)->bytes==NULL))

/*
 * TCL STRING GROWTH ALGORITHM
 *
 * When growing strings (during an append, for example), the following growth
 * algorithm is used:
 *
 *   Attempt to allocate 2 * (originalLength + appendLength)
353
354
355
356
357
358
359

















360
361
362
363
364
365
366
int
Tcl_GetCharLength(
    Tcl_Obj *objPtr)		/* The String object to get the num chars
				 * of. */
{
    String *stringPtr;


















    SetStringFromAny(NULL, objPtr);
    stringPtr = GET_STRING(objPtr);

    /*
     * If numChars is unknown, then calculate the number of characaters while
     * populating the Unicode string.
     */







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
int
Tcl_GetCharLength(
    Tcl_Obj *objPtr)		/* The String object to get the num chars
				 * of. */
{
    String *stringPtr;

    /*
     * Optimize the case where we're really dealing with a bytearray object
     * without string representation; we don't need to convert to a string to
     * perform the get-length operation.
     */

    if (IS_PURE_BYTE_ARRAY(objPtr)) {
	int length;

	(void) Tcl_GetByteArrayFromObj(objPtr, &length);
	return length;
    }

    /*
     * OK, need to work with the object as a string.
     */

    SetStringFromAny(NULL, objPtr);
    stringPtr = GET_STRING(objPtr);

    /*
     * If numChars is unknown, then calculate the number of characaters while
     * populating the Unicode string.
     */
438
439
440
441
442
443
444
















445
446
447
448
449
450
451
    Tcl_Obj *objPtr,		/* The object to get the Unicode charater
				 * from. */
    int index)			/* Get the index'th Unicode character. */
{
    Tcl_UniChar unichar;
    String *stringPtr;

















    SetStringFromAny(NULL, objPtr);
    stringPtr = GET_STRING(objPtr);

    if (stringPtr->numChars == -1) {
	/*
	 * We haven't yet calculated the length, so we don't have the Unicode
	 * str. We need to know the number of chars before we can do indexing.







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
    Tcl_Obj *objPtr,		/* The object to get the Unicode charater
				 * from. */
    int index)			/* Get the index'th Unicode character. */
{
    Tcl_UniChar unichar;
    String *stringPtr;

    /*
     * Optimize the case where we're really dealing with a bytearray object
     * without string representation; we don't need to convert to a string to
     * perform the indexing operation.
     */

    if (IS_PURE_BYTE_ARRAY(objPtr)) {
	unsigned char *bytes = Tcl_GetByteArrayFromObj(objPtr, NULL);

	return bytes[index];
    }

    /*
     * OK, need to work with the object as a string.
     */

    SetStringFromAny(NULL, objPtr);
    stringPtr = GET_STRING(objPtr);

    if (stringPtr->numChars == -1) {
	/*
	 * We haven't yet calculated the length, so we don't have the Unicode
	 * str. We need to know the number of chars before we can do indexing.
605
606
607
608
609
610
611
















612
613
614
615
616
617
618
    Tcl_Obj *objPtr,		/* The Tcl object to find the range of. */
    int first,			/* First index of the range. */
    int last)			/* Last index of the range. */
{
    Tcl_Obj *newObjPtr;		/* The Tcl object to find the range of. */
    String *stringPtr;

















    SetStringFromAny(NULL, objPtr);
    stringPtr = GET_STRING(objPtr);

    if (stringPtr->numChars == -1) {
	/*
	 * We haven't yet calculated the length, so we don't have the Unicode
	 * str. We need to know the number of chars before we can do indexing.







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
    Tcl_Obj *objPtr,		/* The Tcl object to find the range of. */
    int first,			/* First index of the range. */
    int last)			/* Last index of the range. */
{
    Tcl_Obj *newObjPtr;		/* The Tcl object to find the range of. */
    String *stringPtr;

    /*
     * Optimize the case where we're really dealing with a bytearray object
     * without string representation; we don't need to convert to a string to
     * perform the substring operation.
     */

    if (IS_PURE_BYTE_ARRAY(objPtr)) {
	unsigned char *bytes = Tcl_GetByteArrayFromObj(objPtr, NULL);

	return Tcl_NewByteArrayObj(bytes+first, last-first+1);
    }

    /*
     * OK, need to work with the object as a string.
     */

    SetStringFromAny(NULL, objPtr);
    stringPtr = GET_STRING(objPtr);

    if (stringPtr->numChars == -1) {
	/*
	 * We haven't yet calculated the length, so we don't have the Unicode
	 * str. We need to know the number of chars before we can do indexing.
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647

	/*
	 * All of the characters in the Utf string are 1 byte chars, so we
	 * don't store the unicode char. Create a new string object containing
	 * the specified range of chars.
	 */

	newObjPtr = Tcl_NewStringObj(&str[first], last-first+1);

	/*
	 * Since we know the new string only has 1-byte chars, we can set it's
	 * numChars field.
	 */

	SetStringFromAny(NULL, newObjPtr);







|







695
696
697
698
699
700
701
702
703
704
705
706
707
708
709

	/*
	 * All of the characters in the Utf string are 1 byte chars, so we
	 * don't store the unicode char. Create a new string object containing
	 * the specified range of chars.
	 */

	newObjPtr = Tcl_NewStringObj(str+first, last-first+1);

	/*
	 * Since we know the new string only has 1-byte chars, we can set it's
	 * numChars field.
	 */

	SetStringFromAny(NULL, newObjPtr);
738
739
740
741
742
743
744
745
746
747

748
749
750
751
752
753
754
755
756
				 * representation of object, not including
				 * terminating null byte. */
{
    String *stringPtr;

    if (length < 0) {
	/*
	 * Setting to a negative length is nonsense.  This is probably the
	 * result of overflowing the signed integer range.
	 */

	Tcl_Panic(	"Tcl_SetObjLength: negative length requested: "
			"%d (integer overflow?)", length);
    }
    if (Tcl_IsShared(objPtr)) {
	Tcl_Panic("%s called with shared object", "Tcl_SetObjLength");
    }
    SetStringFromAny(NULL, objPtr);

    stringPtr = GET_STRING(objPtr);







|


>
|
|







800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
				 * representation of object, not including
				 * terminating null byte. */
{
    String *stringPtr;

    if (length < 0) {
	/*
	 * Setting to a negative length is nonsense. This is probably the
	 * result of overflowing the signed integer range.
	 */

	Tcl_Panic("Tcl_SetObjLength: negative length requested: "
		"%d (integer overflow?)", length);
    }
    if (Tcl_IsShared(objPtr)) {
	Tcl_Panic("%s called with shared object", "Tcl_SetObjLength");
    }
    SetStringFromAny(NULL, objPtr);

    stringPtr = GET_STRING(objPtr);
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233

1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247




1248
1249
1250
1251
1252
1253
1254
{
    String *stringPtr;
    int length, numChars, allOneByteChars;
    char *bytes;

    /*
     * Handle append of one bytearray object to another as a special case.
     * Note that we only do this when the object being written doesn't have a
     * string rep; if it did, then appending the byte arrays together could
     * well lose information; this is a special-case optimization only.
     */

    if (objPtr->typePtr == &tclByteArrayType && objPtr->bytes == NULL
	    && appendObjPtr->typePtr == &tclByteArrayType) {
	unsigned char *bytesDst, *bytesSrc;
	int lengthSrc, lengthTotal;

	/*
	 * Note that we do not assume that objPtr and appendObjPtr must be
	 * distinct!

	 */

	(void) Tcl_GetByteArrayFromObj(objPtr, &length);
	(void) Tcl_GetByteArrayFromObj(appendObjPtr, &lengthSrc);
	lengthTotal = length + lengthSrc;
	if (((length > lengthSrc) ? length : lengthSrc) > lengthTotal) {
	    Tcl_Panic("overflow when calculating byte array size");
	}
	bytesDst = Tcl_SetByteArrayLength(objPtr, lengthTotal);
	bytesSrc = Tcl_GetByteArrayFromObj(appendObjPtr, NULL);
	memcpy(bytesDst + length, bytesSrc, lengthSrc);
	return;
    }





    SetStringFromAny(NULL, objPtr);

    /*
     * If objPtr has a valid Unicode rep, then get a Unicode string from
     * appendObjPtr and append it.
     */








|
|
|


<
|




|
|
>














>
>
>
>







1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288

1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
{
    String *stringPtr;
    int length, numChars, allOneByteChars;
    char *bytes;

    /*
     * Handle append of one bytearray object to another as a special case.
     * Note that we only do this when the objects don't have string reps; if
     * it did, then appending the byte arrays together could well lose
     * information; this is a special-case optimization only.
     */


    if (IS_PURE_BYTE_ARRAY(objPtr) && IS_PURE_BYTE_ARRAY(appendObjPtr)) {
	unsigned char *bytesDst, *bytesSrc;
	int lengthSrc, lengthTotal;

	/*
	 * We do not assume that objPtr and appendObjPtr must be distinct!
	 * This makes this code a bit more complex than it otherwise would be,
	 * but in turn makes it much safer.
	 */

	(void) Tcl_GetByteArrayFromObj(objPtr, &length);
	(void) Tcl_GetByteArrayFromObj(appendObjPtr, &lengthSrc);
	lengthTotal = length + lengthSrc;
	if (((length > lengthSrc) ? length : lengthSrc) > lengthTotal) {
	    Tcl_Panic("overflow when calculating byte array size");
	}
	bytesDst = Tcl_SetByteArrayLength(objPtr, lengthTotal);
	bytesSrc = Tcl_GetByteArrayFromObj(appendObjPtr, NULL);
	memcpy(bytesDst + length, bytesSrc, lengthSrc);
	return;
    }

    /*
     * Must append as strings.
     */

    SetStringFromAny(NULL, objPtr);

    /*
     * If objPtr has a valid Unicode rep, then get a Unicode string from
     * appendObjPtr and append it.
     */

2050
2051
2052
2053
2054
2055
2056
2057

2058
2059
2060
2061
2062
2063
2064
2065
		isNegative = (l < (long)0);
	    }

	    segment = Tcl_NewObj();
	    allocSegment = 1;
	    Tcl_IncrRefCount(segment);

	    if ((isNegative || gotPlus || gotSpace) && (useBig || (ch == 'd'))) {

		Tcl_AppendToObj(segment, (isNegative ? "-" : gotPlus ? "+" : " "), 1);
	    }

	    if (gotHash) {
		switch (ch) {
		case 'o':
		    Tcl_AppendToObj(segment, "0", 1);
		    precision--;







|
>
|







2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
		isNegative = (l < (long)0);
	    }

	    segment = Tcl_NewObj();
	    allocSegment = 1;
	    Tcl_IncrRefCount(segment);

	    if ((isNegative || gotPlus || gotSpace) && (useBig || ch=='d')) {
		Tcl_AppendToObj(segment,
			(isNegative ? "-" : gotPlus ? "+" : " "), 1);
	    }

	    if (gotHash) {
		switch (ch) {
		case 'o':
		    Tcl_AppendToObj(segment, "0", 1);
		    precision--;