Tcl Source Code

Check-in [a61fef8429]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add bytearray checking to TclCheckEmptyString(), and then use TclCheckEmptyString() in Tcl_AppendObjToObj and TclStringCat() to reduce string generation.

jn: See discussion on mailing list "On the changing the sourcing of scripts to using strict profiles by default" for explanation why this commit is wrong, just as [e6fe76ae6b] and [4a7b807856] (which are similar commits, disguised as different ones).

Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | pyk-whatever
Files: files | file ages | folders
SHA3-256: a61fef8429dfff4bc0489759384a53fac5e0b4d8ffc0f62c59d22dc8f4b9e144
User & Date: pooryorick 2023-04-23 11:28:19
Original Comment: Add bytearray checking to TclCheckEmptyString(), and then use TclCheckEmptyString() in Tcl_AppendObjToObj and TclStringCat() to reduce string generation.
References
2023-04-26
02:36 Ticket [f5eadcbf9a] passing pointer to uninitialized memory leads Tcl_UniCharToUtf() to corrupt data status still Pending with 3 other changes artifact: e3104d5d36 user: pooryorick
Context
2023-04-23
23:57
Add bytearray checking to TclCheckEmptyString(), and then use TclCheckEmptyString() in Tcl_AppendObj... Closed-Leaf check-in: 0b684db687 user: pooryorick tags: pyk-Tcl_AppendObjToObj
11:39
Set the encoding profile to strict in Tcl_FSEvalFileEx(). This is independent of TIP 657: A scrip... Closed-Leaf check-in: f867f44c9c user: pooryorick tags: pyk-whatever
11:28
Add bytearray checking to TclCheckEmptyString(), and then use TclCheckEmptyString() in Tcl_AppendOb... check-in: a61fef8429 user: pooryorick tags: pyk-whatever
10:41
Support combination TCL_UTF_MAX=3 with TCL_NO_DEPRECATED=1 check-in: 4ba40baf3f user: jan.nijtmans tags: trunk, main
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/StringObj.3.

111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
returned as a new value. If negative, behave the same as if the
value was 0.
.AP Tcl_Size last in
The index of the last Unicode character in the Unicode range to be
returned as a new value. If negative, take all characters up to
the last one available.
.AP Tcl_Obj *objPtr in/out
Points to a value to manipulate.
.AP Tcl_Obj *appendObjPtr in
The value to append to \fIobjPtr\fR in \fBTcl_AppendObjToObj\fR.
.AP "Tcl_Size \&| int" *lengthPtr out
The location where \fBTcl_GetStringFromObj\fR will store the length
of a value's string representation. May be (int *)NULL when not used.
.AP "const char" *string in
Null-terminated string value to append to \fIobjPtr\fR.







|







111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
returned as a new value. If negative, behave the same as if the
value was 0.
.AP Tcl_Size last in
The index of the last Unicode character in the Unicode range to be
returned as a new value. If negative, take all characters up to
the last one available.
.AP Tcl_Obj *objPtr in/out
A pointer to a value to read, or to an unshared value to modify.
.AP Tcl_Obj *appendObjPtr in
The value to append to \fIobjPtr\fR in \fBTcl_AppendObjToObj\fR.
.AP "Tcl_Size \&| int" *lengthPtr out
The location where \fBTcl_GetStringFromObj\fR will store the length
of a value's string representation. May be (int *)NULL when not used.
.AP "const char" *string in
Null-terminated string value to append to \fIobjPtr\fR.

Changes to generic/tclStringObj.c.

510
511
512
513
514
515
516





517
518
519
520
521
522
523
    Tcl_Obj *objPtr)
{
    Tcl_Size length = TCL_INDEX_NONE;

    if (objPtr->bytes == &tclEmptyString) {
	return TCL_EMPTYSTRING_YES;
    }






    if (TclListObjIsCanonical(objPtr)) {
	TclListObjLengthM(NULL, objPtr, &length);
	return length == 0;
    }

    if (TclIsPureDict(objPtr)) {







>
>
>
>
>







510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
    Tcl_Obj *objPtr)
{
    Tcl_Size length = TCL_INDEX_NONE;

    if (objPtr->bytes == &tclEmptyString) {
	return TCL_EMPTYSTRING_YES;
    }

    if (TclIsPureByteArray(objPtr)
	&& Tcl_GetCharLength(objPtr) == 0) {
	return TCL_EMPTYSTRING_YES;
    }

    if (TclListObjIsCanonical(objPtr)) {
	TclListObjLengthM(NULL, objPtr, &length);
	return length == 0;
    }

    if (TclIsPureDict(objPtr)) {
1427
1428
1429
1430
1431
1432
1433
1434

1435
1436
1437
1438
1439


1440
1441
1442




1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
    Tcl_Obj *appendObjPtr)	/* Object to append. */
{
    String *stringPtr;
    Tcl_Size length = 0, numChars;
    Tcl_Size appendNumChars = TCL_INDEX_NONE;
    const char *bytes;

    /*

     * Special case: second object is standard-empty is fast case. We know
     * that appending nothing to anything leaves that starting anything...
     */

    if (appendObjPtr->bytes == &tclEmptyString) {


	return;
    }





    /*
     * Handle append of one ByteArray object to another as a special case.
     * Note that we only do this when the objects are pure so that the
     * bytearray faithfully represent the true value; Otherwise appending the
     * byte arrays together could lose information;
     */

    if ((TclIsPureByteArray(objPtr) || objPtr->bytes == &tclEmptyString)
	    && TclIsPureByteArray(appendObjPtr)) {
	/*
	 * You might expect the code here to be
	 *
	 *  bytes = Tcl_GetByteArrayFromObj(appendObjPtr, &length);
	 *  TclAppendBytesToByteArray(objPtr, bytes, length);
	 *
	 * and essentially all of the time that would be fine. However, it
	 * would run into trouble in the case where objPtr and appendObjPtr
	 * point to the same thing. That may never be a good idea. It seems to







<
>
|
<
<
|
|
>
>



>
>
>
>
|
<
|
|
<
|

<
<

|







1432
1433
1434
1435
1436
1437
1438

1439
1440


1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452

1453
1454

1455
1456


1457
1458
1459
1460
1461
1462
1463
1464
1465
    Tcl_Obj *appendObjPtr)	/* Object to append. */
{
    String *stringPtr;
    Tcl_Size length = 0, numChars;
    Tcl_Size appendNumChars = TCL_INDEX_NONE;
    const char *bytes;


    if (TclCheckEmptyString(appendObjPtr) == TCL_EMPTYSTRING_YES) {
	return;


    }

    if (TclCheckEmptyString(objPtr) == TCL_EMPTYSTRING_YES) {
	TclSetDuplicateObj(objPtr, appendObjPtr);
	return;
    }

    if (
	TclIsPureByteArray(appendObjPtr)
	&& (TclIsPureByteArray(objPtr) || objPtr->bytes == &tclEmptyString)
    ) {
	/*

	 * Both bytearray objects are pure, so the second internal bytearray value
	 * can be appended to the first, with no need to modify the "bytes" field.

	 */



	/*
	 * One might expect the code here to be
	 *
	 *  bytes = Tcl_GetByteArrayFromObj(appendObjPtr, &length);
	 *  TclAppendBytesToByteArray(objPtr, bytes, length);
	 *
	 * and essentially all of the time that would be fine. However, it
	 * would run into trouble in the case where objPtr and appendObjPtr
	 * point to the same thing. That may never be a good idea. It seems to
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
    Tcl_Obj *objResultPtr, * const *ov;
    int binary = 1;
    Tcl_Size oc;
    Tcl_Size length = 0;
    int allowUniChar = 1, requestUniChar = 0, forceUniChar = 0;
    Tcl_Size first = objc - 1;	/* Index of first value possibly not empty */
    Tcl_Size last = 0;		/* Index of last value possibly not empty */
    int inPlace = flags & TCL_STRING_IN_PLACE;

    /* assert ( objc >= 0 ) */

    if (objc <= 1) {
	/* Negative (shouldn't be), one or no objects; return first or empty */
	return objc == 1 ? objv[0] : Tcl_NewObj();
    }







|







3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
    Tcl_Obj *objResultPtr, * const *ov;
    int binary = 1;
    Tcl_Size oc;
    Tcl_Size length = 0;
    int allowUniChar = 1, requestUniChar = 0, forceUniChar = 0;
    Tcl_Size first = objc - 1;	/* Index of first value possibly not empty */
    Tcl_Size last = 0;		/* Index of last value possibly not empty */
    int inPlace = (flags & TCL_STRING_IN_PLACE) && !Tcl_IsShared(*objv);

    /* assert ( objc >= 0 ) */

    if (objc <= 1) {
	/* Negative (shouldn't be), one or no objects; return first or empty */
	return objc == 1 ? objv[0] : Tcl_NewObj();
    }
3250
3251
3252
3253
3254
3255
3256
3257

3258
3259
3260
3261
3262
3263
3264

	    do {
		/* assert ( pendingPtr == NULL ) */
		/* assert ( length == 0 ) */

		Tcl_Obj *objPtr = *ov++;

		if (objPtr->bytes == NULL) {

		    /* No string rep; Take the chance we can avoid making it */
		    pendingPtr = objPtr;
		} else {
		    (void)Tcl_GetStringFromObj(objPtr, &length); /* PANIC? */
		}
	    } while (--oc && (length == 0) && (pendingPtr == NULL));








|
>







3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270

	    do {
		/* assert ( pendingPtr == NULL ) */
		/* assert ( length == 0 ) */

		Tcl_Obj *objPtr = *ov++;

		if (objPtr->bytes == NULL
		    && TclCheckEmptyString(objPtr) != TCL_EMPTYSTRING_YES) {
		    /* No string rep; Take the chance we can avoid making it */
		    pendingPtr = objPtr;
		} else {
		    (void)Tcl_GetStringFromObj(objPtr, &length); /* PANIC? */
		}
	    } while (--oc && (length == 0) && (pendingPtr == NULL));

3326
3327
3328
3329
3330
3331
3332

3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
    if (last <= first /*|| length == 0 */) {
	/* Only one non-empty value or zero length; return first */
	/* NOTE: (length == 0) implies (last <= first) */
	return objv[first];
    }

    objv += first; objc = (last - first + 1);


    if (binary) {
	/* Efficiently produce a pure byte array result */
	unsigned char *dst;

	/*
	 * Broken interface! Byte array value routines offer no way to handle
	 * failure to allocate enough space. Following stanza may panic.
	 */

	if (inPlace && !Tcl_IsShared(*objv)) {
	    Tcl_Size start = 0;

	    objResultPtr = *objv++; objc--;
	    (void)Tcl_GetByteArrayFromObj(objResultPtr, &start);
	    dst = Tcl_SetByteArrayLength(objResultPtr, length) + start;
	} else {
	    objResultPtr = Tcl_NewByteArrayObj(NULL, length);







>










|







3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
    if (last <= first /*|| length == 0 */) {
	/* Only one non-empty value or zero length; return first */
	/* NOTE: (length == 0) implies (last <= first) */
	return objv[first];
    }

    objv += first; objc = (last - first + 1);
    inPlace = (flags & TCL_STRING_IN_PLACE) && !Tcl_IsShared(*objv);

    if (binary) {
	/* Efficiently produce a pure byte array result */
	unsigned char *dst;

	/*
	 * Broken interface! Byte array value routines offer no way to handle
	 * failure to allocate enough space. Following stanza may panic.
	 */

	if (inPlace) {
	    Tcl_Size start = 0;

	    objResultPtr = *objv++; objc--;
	    (void)Tcl_GetByteArrayFromObj(objResultPtr, &start);
	    dst = Tcl_SetByteArrayLength(objResultPtr, length) + start;
	} else {
	    objResultPtr = Tcl_NewByteArrayObj(NULL, length);
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
		dst += more;
	    }
	}
    } else if ((allowUniChar && requestUniChar) || forceUniChar) {
	/* Efficiently produce a pure Tcl_UniChar array result */
	Tcl_UniChar *dst;

	if (inPlace && !Tcl_IsShared(*objv)) {
	    Tcl_Size start;

	    objResultPtr = *objv++; objc--;

	    /* Ugly interface! Force resize of the unicode array. */
	    (void)Tcl_GetUnicodeFromObj(objResultPtr, &start);
	    Tcl_InvalidateStringRep(objResultPtr);







|







3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
		dst += more;
	    }
	}
    } else if ((allowUniChar && requestUniChar) || forceUniChar) {
	/* Efficiently produce a pure Tcl_UniChar array result */
	Tcl_UniChar *dst;

	if (inPlace) {
	    Tcl_Size start;

	    objResultPtr = *objv++; objc--;

	    /* Ugly interface! Force resize of the unicode array. */
	    (void)Tcl_GetUnicodeFromObj(objResultPtr, &start);
	    Tcl_InvalidateStringRep(objResultPtr);
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
		dst += more;
	    }
	}
    } else {
	/* Efficiently concatenate string reps */
	char *dst;

	if (inPlace && !Tcl_IsShared(*objv)) {
	    Tcl_Size start;

	    objResultPtr = *objv++; objc--;

	    (void)Tcl_GetStringFromObj(objResultPtr, &start);
	    if (0 == Tcl_AttemptSetObjLength(objResultPtr, length)) {
		if (interp) {







|







3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
		dst += more;
	    }
	}
    } else {
	/* Efficiently concatenate string reps */
	char *dst;

	if (inPlace) {
	    Tcl_Size start;

	    objResultPtr = *objv++; objc--;

	    (void)Tcl_GetStringFromObj(objResultPtr, &start);
	    if (0 == Tcl_AttemptSetObjLength(objResultPtr, length)) {
		if (interp) {

Changes to tests/string.test.

2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
test string-29.11.$noComp {string cat, efficiency} -body {
    tcl::unsupported::representation \
	[run {string cat [list x] [encoding convertto utf-8 {}]}]
} -match glob -result {*no string representation}
test string-29.12.$noComp {string cat, efficiency} -body {
    tcl::unsupported::representation \
	[run {string cat [encoding convertto utf-8 {}] [list x]}]
} -match glob -result {*, string representation "x"}
test string-29.13.$noComp {string cat, efficiency} -body {
    tcl::unsupported::representation [run {string cat \
	[encoding convertto utf-8 {}] [encoding convertto utf-8 {}] [list x]}]
} -match glob -result {*, string representation "x"}
test string-29.14.$noComp {string cat, efficiency} -setup {
    set e [encoding convertto utf-8 {}]
} -cleanup {
    unset e
} -body {
    tcl::unsupported::representation [run {string cat $e $e [list x]}]
} -match glob -result {*no string representation}







|



|







2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
test string-29.11.$noComp {string cat, efficiency} -body {
    tcl::unsupported::representation \
	[run {string cat [list x] [encoding convertto utf-8 {}]}]
} -match glob -result {*no string representation}
test string-29.12.$noComp {string cat, efficiency} -body {
    tcl::unsupported::representation \
	[run {string cat [encoding convertto utf-8 {}] [list x]}]
} -match glob -result {*, no string representation}
test string-29.13.$noComp {string cat, efficiency} -body {
    tcl::unsupported::representation [run {string cat \
	[encoding convertto utf-8 {}] [encoding convertto utf-8 {}] [list x]}]
} -match glob -result {*, no string representation}
test string-29.14.$noComp {string cat, efficiency} -setup {
    set e [encoding convertto utf-8 {}]
} -cleanup {
    unset e
} -body {
    tcl::unsupported::representation [run {string cat $e $e [list x]}]
} -match glob -result {*no string representation}