Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | More/better/cleaner handling of the bytearray special casing for string ops. |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
174db4e4cde9fd85a57a9b07edbfc55a |
User & Date: | dkf 2009-02-05 11:57:25 |
Context
2009-02-05
| ||
14:21 | Fix [Bug 2568434] check-in: f783e3b9ff user: dkf tags: trunk | |
11:57 | More/better/cleaner handling of the bytearray special casing for string ops. check-in: 174db4e4cd user: dkf tags: trunk | |
01:21 | Improve efficiency of Tcl_AppendObjToObj's bytearray handling. check-in: 94be40db3a user: dkf tags: trunk | |
Changes
Changes to ChangeLog.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | 2009-02-05 Donal K. Fellows <[email protected]> * generic/tclStringObj.c (Tcl_AppendObjToObj): Special-case the appending of one bytearray to another, which can be extremely rapid. Part of scheme to address [Bug 1665628] by making the basic string operations more efficient on byte arrays. 2009-02-04 Don Porter <[email protected]> * generic/tclStringObj.c: Added overflow protections to the AppendUtfToUtfRep routine to either avoid invalid arguments and crashes, or to replace them with controlled panics. [Bug 2561794] * generic/tclCmdMZ.c: Prevent crashes due to int overflow of the length of the result of [string repeat]. [Bug 2561746] 2009-02-03 Jan Nijtmans <[email protected]> | > > | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | 2009-02-05 Donal K. Fellows <[email protected]> * generic/tclStringObj.c (Tcl_AppendObjToObj): Special-case the appending of one bytearray to another, which can be extremely rapid. Part of scheme to address [Bug 1665628] by making the basic string operations more efficient on byte arrays. (Tcl_GetCharLength, Tcl_GetUniChar, Tcl_GetRange): More special casing work for bytearrays. 2009-02-04 Don Porter <[email protected]> * generic/tclStringObj.c: Added overflow protections to the AppendUtfToUtfRep routine to either avoid invalid arguments and crashes, or to replace them with controlled panics. [Bug 2561794] * generic/tclCmdMZ.c: Prevent crashes due to int overflow of the length of the result of [string repeat]. [Bug 2561746] 2009-02-03 Jan Nijtmans <[email protected]> * macosx/tclMacOSXFCmd.c: Eliminate some unnessary type casts * unix/tclLoadDyld.c: some internal const decorations * unix/tclUnixCompat.c: spacing * unix/tclUnixFCmd.c * unix/tclUnixFile.c * win/tclWinDde.c * win/tclWinFCmd.c * win/tclWinInit.c * win/tclWinLoad.c * win/tclWinPipe.c |
︙ | ︙ |
Changes to generic/tclStringObj.c.
︙ | ︙ | |||
29 30 31 32 33 34 35 | * * Copyright (c) 1995-1997 Sun Microsystems, Inc. * Copyright (c) 1999 by Scriptics Corporation. * * See the file "license.terms" for information on usage and redistribution of * this file, and for a DISCLAIMER OF ALL WARRANTIES. * | | | 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | * * Copyright (c) 1995-1997 Sun Microsystems, Inc. * Copyright (c) 1999 by Scriptics Corporation. * * See the file "license.terms" for information on usage and redistribution of * this file, and for a DISCLAIMER OF ALL WARRANTIES. * * RCS: @(#) $Id: tclStringObj.c,v 1.86 2009/02/05 11:57:26 dkf Exp $ */ #include "tclInt.h" #include "tommath.h" /* * Prototypes for functions defined later in this file: */ |
︙ | ︙ | |||
130 131 132 133 134 135 136 137 138 139 140 141 142 143 | : attemptckrealloc((char *) ptr, (unsigned) STRING_SIZE( \ (numBytes) ? (numBytes) : sizeof(Tcl_UniChar)) )) #define GET_STRING(objPtr) \ ((String *) (objPtr)->internalRep.otherValuePtr) #define SET_STRING(objPtr, stringPtr) \ ((objPtr)->internalRep.otherValuePtr = (void *) (stringPtr)) /* * TCL STRING GROWTH ALGORITHM * * When growing strings (during an append, for example), the following growth * algorithm is used: * * Attempt to allocate 2 * (originalLength + appendLength) | > > > > > > > > > > > > > | 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | : attemptckrealloc((char *) ptr, (unsigned) STRING_SIZE( \ (numBytes) ? (numBytes) : sizeof(Tcl_UniChar)) )) #define GET_STRING(objPtr) \ ((String *) (objPtr)->internalRep.otherValuePtr) #define SET_STRING(objPtr, stringPtr) \ ((objPtr)->internalRep.otherValuePtr = (void *) (stringPtr)) /* * Macro that encapsulates the logic that determines when it is safe to * interpret a string as a byte array directly. In summary, the object must be * a byte array and must not have a string representation (as the operations * that it is used in are defined on strings, not byte arrays). Theoretically * it is possible to also be efficient in the case where the object's bytes * field is filled by generation from the byte array (c.f. list canonicality) * but we don't do that at the moment since this is purely about efficiency. */ #define IS_PURE_BYTE_ARRAY(objPtr) \ (((objPtr)->typePtr==&tclByteArrayType) && ((objPtr)->bytes==NULL)) /* * TCL STRING GROWTH ALGORITHM * * When growing strings (during an append, for example), the following growth * algorithm is used: * * Attempt to allocate 2 * (originalLength + appendLength) |
︙ | ︙ | |||
353 354 355 356 357 358 359 360 361 362 363 364 365 366 | int Tcl_GetCharLength( Tcl_Obj *objPtr) /* The String object to get the num chars * of. */ { String *stringPtr; SetStringFromAny(NULL, objPtr); stringPtr = GET_STRING(objPtr); /* * If numChars is unknown, then calculate the number of characaters while * populating the Unicode string. */ | > > > > > > > > > > > > > > > > > | 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 | int Tcl_GetCharLength( Tcl_Obj *objPtr) /* The String object to get the num chars * of. */ { String *stringPtr; /* * Optimize the case where we're really dealing with a bytearray object * without string representation; we don't need to convert to a string to * perform the get-length operation. */ if (IS_PURE_BYTE_ARRAY(objPtr)) { int length; (void) Tcl_GetByteArrayFromObj(objPtr, &length); return length; } /* * OK, need to work with the object as a string. */ SetStringFromAny(NULL, objPtr); stringPtr = GET_STRING(objPtr); /* * If numChars is unknown, then calculate the number of characaters while * populating the Unicode string. */ |
︙ | ︙ | |||
438 439 440 441 442 443 444 445 446 447 448 449 450 451 | Tcl_Obj *objPtr, /* The object to get the Unicode charater * from. */ int index) /* Get the index'th Unicode character. */ { Tcl_UniChar unichar; String *stringPtr; SetStringFromAny(NULL, objPtr); stringPtr = GET_STRING(objPtr); if (stringPtr->numChars == -1) { /* * We haven't yet calculated the length, so we don't have the Unicode * str. We need to know the number of chars before we can do indexing. | > > > > > > > > > > > > > > > > | 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | Tcl_Obj *objPtr, /* The object to get the Unicode charater * from. */ int index) /* Get the index'th Unicode character. */ { Tcl_UniChar unichar; String *stringPtr; /* * Optimize the case where we're really dealing with a bytearray object * without string representation; we don't need to convert to a string to * perform the indexing operation. */ if (IS_PURE_BYTE_ARRAY(objPtr)) { unsigned char *bytes = Tcl_GetByteArrayFromObj(objPtr, NULL); return bytes[index]; } /* * OK, need to work with the object as a string. */ SetStringFromAny(NULL, objPtr); stringPtr = GET_STRING(objPtr); if (stringPtr->numChars == -1) { /* * We haven't yet calculated the length, so we don't have the Unicode * str. We need to know the number of chars before we can do indexing. |
︙ | ︙ | |||
605 606 607 608 609 610 611 612 613 614 615 616 617 618 | Tcl_Obj *objPtr, /* The Tcl object to find the range of. */ int first, /* First index of the range. */ int last) /* Last index of the range. */ { Tcl_Obj *newObjPtr; /* The Tcl object to find the range of. */ String *stringPtr; SetStringFromAny(NULL, objPtr); stringPtr = GET_STRING(objPtr); if (stringPtr->numChars == -1) { /* * We haven't yet calculated the length, so we don't have the Unicode * str. We need to know the number of chars before we can do indexing. | > > > > > > > > > > > > > > > > | 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 | Tcl_Obj *objPtr, /* The Tcl object to find the range of. */ int first, /* First index of the range. */ int last) /* Last index of the range. */ { Tcl_Obj *newObjPtr; /* The Tcl object to find the range of. */ String *stringPtr; /* * Optimize the case where we're really dealing with a bytearray object * without string representation; we don't need to convert to a string to * perform the substring operation. */ if (IS_PURE_BYTE_ARRAY(objPtr)) { unsigned char *bytes = Tcl_GetByteArrayFromObj(objPtr, NULL); return Tcl_NewByteArrayObj(bytes+first, last-first+1); } /* * OK, need to work with the object as a string. */ SetStringFromAny(NULL, objPtr); stringPtr = GET_STRING(objPtr); if (stringPtr->numChars == -1) { /* * We haven't yet calculated the length, so we don't have the Unicode * str. We need to know the number of chars before we can do indexing. |
︙ | ︙ | |||
633 634 635 636 637 638 639 | /* * All of the characters in the Utf string are 1 byte chars, so we * don't store the unicode char. Create a new string object containing * the specified range of chars. */ | | | 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 | /* * All of the characters in the Utf string are 1 byte chars, so we * don't store the unicode char. Create a new string object containing * the specified range of chars. */ newObjPtr = Tcl_NewStringObj(str+first, last-first+1); /* * Since we know the new string only has 1-byte chars, we can set it's * numChars field. */ SetStringFromAny(NULL, newObjPtr); |
︙ | ︙ | |||
738 739 740 741 742 743 744 | * representation of object, not including * terminating null byte. */ { String *stringPtr; if (length < 0) { /* | | > | | | 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 | * representation of object, not including * terminating null byte. */ { String *stringPtr; if (length < 0) { /* * Setting to a negative length is nonsense. This is probably the * result of overflowing the signed integer range. */ Tcl_Panic("Tcl_SetObjLength: negative length requested: " "%d (integer overflow?)", length); } if (Tcl_IsShared(objPtr)) { Tcl_Panic("%s called with shared object", "Tcl_SetObjLength"); } SetStringFromAny(NULL, objPtr); stringPtr = GET_STRING(objPtr); |
︙ | ︙ | |||
1214 1215 1216 1217 1218 1219 1220 | { String *stringPtr; int length, numChars, allOneByteChars; char *bytes; /* * Handle append of one bytearray object to another as a special case. | | | | < | | | > > > > > | 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 | { String *stringPtr; int length, numChars, allOneByteChars; char *bytes; /* * Handle append of one bytearray object to another as a special case. * Note that we only do this when the objects don't have string reps; if * it did, then appending the byte arrays together could well lose * information; this is a special-case optimization only. */ if (IS_PURE_BYTE_ARRAY(objPtr) && IS_PURE_BYTE_ARRAY(appendObjPtr)) { unsigned char *bytesDst, *bytesSrc; int lengthSrc, lengthTotal; /* * We do not assume that objPtr and appendObjPtr must be distinct! * This makes this code a bit more complex than it otherwise would be, * but in turn makes it much safer. */ (void) Tcl_GetByteArrayFromObj(objPtr, &length); (void) Tcl_GetByteArrayFromObj(appendObjPtr, &lengthSrc); lengthTotal = length + lengthSrc; if (((length > lengthSrc) ? length : lengthSrc) > lengthTotal) { Tcl_Panic("overflow when calculating byte array size"); } bytesDst = Tcl_SetByteArrayLength(objPtr, lengthTotal); bytesSrc = Tcl_GetByteArrayFromObj(appendObjPtr, NULL); memcpy(bytesDst + length, bytesSrc, lengthSrc); return; } /* * Must append as strings. */ SetStringFromAny(NULL, objPtr); /* * If objPtr has a valid Unicode rep, then get a Unicode string from * appendObjPtr and append it. */ |
︙ | ︙ | |||
2050 2051 2052 2053 2054 2055 2056 | isNegative = (l < (long)0); } segment = Tcl_NewObj(); allocSegment = 1; Tcl_IncrRefCount(segment); | | > | | 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 | isNegative = (l < (long)0); } segment = Tcl_NewObj(); allocSegment = 1; Tcl_IncrRefCount(segment); if ((isNegative || gotPlus || gotSpace) && (useBig || ch=='d')) { Tcl_AppendToObj(segment, (isNegative ? "-" : gotPlus ? "+" : " "), 1); } if (gotHash) { switch (ch) { case 'o': Tcl_AppendToObj(segment, "0", 1); precision--; |
︙ | ︙ |