Ticket UUID: | 4731d31eff3e2245b2300462b9185ab1b2645810 | |||
Title: | iTcl and Tcl 8.6 performance with Tcl_Preserve | |||
Type: | Patch | Version: | Tcl 8.6.9 | |
Submitter: | simons | Created on: | 2019-01-29 18:43:05 | |
Subsystem: | 42. Memory Preservation | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Important | |
Status: | Closed | Last Modified: | 2019-10-16 16:24:54 | |
Resolution: | Postponed | Closed By: | sebres | |
Closed on: | 2019-10-16 16:24:54 | |||
Description: |
iTcl and Tcl 8.6 performance with Tcl_Preserve As posted by Phil Brooks on comp.lang.tcl, we have identified a performance/scalability problem with Tcl_Preserve/Tcl_Release as used in ITcl, and have a proposed patch. I'm not sure the Category of 'Memory Preservation' is correct, but that is the function of Tcl_Preserve/Tcl_Release and is where the performance hotspot is. The issue is that Tcl_Preserve is taking a lot longer in Tcl 8.6 when very large numbers of iTcl classes are created. We have made a patch to the implementation of Tcl_Preserve that uses a hashtable to improve the performance. Tcl8.4 time Initial 10 classes - 4045 microseconds per iteration Final 10 classes - 4022 microseconds per iteration Tcl8.5: Initial 10 classes - 5089 microseconds per iteration Final 10 classes - 4012 microseconds per iteration Tcl 8.6: Initial 10 classes - 20229 microseconds per iteration Final 10 classes - 93309 microseconds per iteration Tcl8.6 (with hashtable in Tcl_Preserve/Tcl_Release): Initial 10 classes - 16128 microseconds per iteration Final 10 classes - 12721 microseconds per iteration The key here is that the time taken to create the final 10 classes is not significantly more than for the first 10 classes. We are still not at the same level of performance as Tcl8.4 (or 8.5), but this fixes some of the poor scalability issue with Tcl_Preserve in 8.6. The benchmark follows: ---------------------------------------------- package require Itcl # Each class and class variable defined using ITcl will create an # entry using Tcl_Preserve; these are not deleted until the class is deleted. # Unfortunately, Tcl_Preserve (and Tcl_Release) use a simple table and linear # search for the and so will slow down the more items are added. puts "Initial 10 classes - [time { for { set i 0 } { $i < 10 } { incr i } { itcl::class timeClass$i { for { set j 0 } { $j < 100 } { incr j } { public variable d$j } } } }]" # create 500 classes with 100 local vars each for { set i 0 } { $i < 500 } { incr i } { itcl::class moreClass$i { for { set j 0 } { $j < 100 } { incr j } { public variable d$j } } } # Now the creation of these final 10 classes is quite a bit slower due to the # extra items in the array used by Tcl_Preserve for the defined classes and class variables puts " Final 10 classes - [time { for { set i 0 } { $i < 10 } { incr i } { itcl::class finalClass$i { for { set j 0 } { $j < 100 } { incr j } { public variable d$j } } } }]" ---------------------------------------------- The patch follows. This is output from diff -u. All changes are in the one file, generic/tclPreserve.C ---------------------------------------------- Index: tcl8.6.9/generic/tclPreserve.c =================================================================== RCS file: /cvs/components/icv_components/src/tcl8.6.9/generic/tclPreserve.c,v retrieving revision 1.1 diff -u -r1.1 tclPreserve.c --- tcl8.6.9/generic/tclPreserve.c 13 Dec 2018 11:04:36 -0000 1.1 +++ tcl8.6.9/generic/tclPreserve.c 25 Jan 2019 17:49:59 -0000 @@ -35,10 +35,10 @@ * Global data structures used to hold the list of preserved data references. * These variables are protected by "preserveMutex". */ - static Reference *refArray = NULL; /* First in array of references. */ +static Tcl_HashTable *indexTable = NULL; /* Hash Table of reference indices */ static int spaceAvl = 0; /* Total number of structures available at - * *firstRefPtr. */ + * refArray. */ static int inUse = 0; /* Count of structures currently in use in * refArray. */ TCL_DECLARE_MUTEX(preserveMutex)/* To protect the above statics */ @@ -89,6 +89,8 @@ { Tcl_MutexLock(&preserveMutex); if (spaceAvl != 0) { + ckfree(indexTable); + indexTable = NULL; ckfree(refArray); refArray = NULL; inUse = 0; @@ -121,7 +123,9 @@ ClientData clientData) /* Pointer to malloc'ed block of memory. */ { Reference *refPtr; - int i; + Tcl_HashEntry *entry = NULL; + long index; + int newItem = 0; /* * See if there is already a reference for this pointer. If so, just @@ -129,19 +133,34 @@ */ Tcl_MutexLock(&preserveMutex); - for (i=0, refPtr=refArray ; i<inUse ; i++, refPtr++) { - if (refPtr->clientData == clientData) { - refPtr->refCount++; - Tcl_MutexUnlock(&preserveMutex); - return; - } + + /* + * Allocate the hashtable of indices + */ + if (indexTable == NULL) { + indexTable = (Tcl_HashTable*) ckalloc(sizeof (Tcl_HashTable)); + Tcl_InitHashTable(indexTable, TCL_ONE_WORD_KEYS); } /* + * Find the item in the hash table, or create if it's not there + */ + entry = Tcl_CreateHashEntry(indexTable, clientData, &newItem); + if (!newItem) { + /* found the existing entry, increment its refCount */ + index = (long)Tcl_GetHashValue(entry); + refPtr = &refArray[index]; +/* ASSERT (refPtr->clientData == clientData); */ + refPtr->refCount++; + Tcl_MutexUnlock(&preserveMutex); + return; + } + + + /* * Make a reference array if it doesn't already exist, or make it bigger * if it is full. */ - if (inUse == spaceAvl) { spaceAvl = spaceAvl ? 2*spaceAvl : INITIAL_SIZE; refArray = ckrealloc(refArray, spaceAvl * sizeof(Reference)); @@ -150,13 +169,13 @@ /* * Make a new entry for the new reference. */ - - refPtr = &refArray[inUse]; + index = inUse++; + Tcl_SetHashValue(entry, (char *)index); + refPtr = &refArray[index]; refPtr->clientData = clientData; refPtr->refCount = 1; refPtr->mustFree = 0; refPtr->freeProc = TCL_STATIC; - inUse += 1; Tcl_MutexUnlock(&preserveMutex); } @@ -184,16 +203,20 @@ ClientData clientData) /* Pointer to malloc'ed block of memory. */ { Reference *refPtr; - int i; + long index; + int mustFree; + Tcl_FreeProc *freeProc; + Tcl_HashEntry *entry = NULL; Tcl_MutexLock(&preserveMutex); - for (i=0, refPtr=refArray ; i<inUse ; i++, refPtr++) { - int mustFree; - Tcl_FreeProc *freeProc; - if (refPtr->clientData != clientData) { - continue; - } + if (indexTable != NULL) { + entry = Tcl_FindHashEntry(indexTable, clientData); + } + if (entry != NULL) { + index = (long)Tcl_GetHashValue(entry); + refPtr = &refArray[index]; +/* ASSERT (refPtr->clientData == clientData); */ if (--refPtr->refCount != 0) { Tcl_MutexUnlock(&preserveMutex); @@ -210,12 +233,28 @@ freeProc = refPtr->freeProc; mustFree = refPtr->mustFree; inUse--; - if (i < inUse) { - refArray[i] = refArray[inUse]; + if (index < inUse) { + /* we also need to update the hashtable using clientData from the last index */ + Tcl_HashEntry *lastEntry = NULL; + lastEntry = Tcl_FindHashEntry(indexTable, refArray[inUse].clientData); + if (lastEntry == NULL) { + inUse++; /* change the size back so nothing's modified */ + Tcl_MutexUnlock(&preserveMutex); + Tcl_Panic("Tcl_Release couldn't find reference for top %p", refArray[inUse].clientData); + return; + } + Tcl_SetHashValue(lastEntry, (char *)index); + /* copy down the reference to this location */ + refArray[index] = refArray[inUse]; } /* - * Now committed to disposing the data. But first, we've patched up + * Now committed to disposing the data. + */ + Tcl_DeleteHashEntry(entry); + + /* + * But first, we've patched up * all the global data structures so we should release the mutex now. * Only then should we dabble around with potentially-slow memory * managers... @@ -264,7 +303,8 @@ Tcl_FreeProc *freeProc) /* Function to actually do free. */ { Reference *refPtr; - int i; + long index; + Tcl_HashEntry *entry = NULL; /* * See if there is a reference for this pointer. If so, set its "mustFree" @@ -272,10 +312,15 @@ */ Tcl_MutexLock(&preserveMutex); - for (i = 0, refPtr = refArray; i < inUse; i++, refPtr++) { - if (refPtr->clientData != clientData) { - continue; - } + + if (indexTable != NULL) { + entry = Tcl_FindHashEntry(indexTable, clientData); + } + + if (entry != NULL) { + index = (long)Tcl_GetHashValue(entry); + refPtr = &refArray[index]; +/* ASSERT (refPtr->clientData == clientData); */ if (refPtr->mustFree) { Tcl_Panic("Tcl_EventuallyFree called twice for %p", clientData); } | |||
User Comments: |
sebres added on 2019-10-16 16:24:54:
Fixed now in itcl-repository (both branches are merged now), thus close this here. sebres added on 2019-04-19 15:26:16: Here is related RFE in Itcl-repo - itcl#fe70356a54338a4d. sebres added on 2019-02-07 22:40:59: So as assumed it looks like Itcl regression, and I am mostly ready with a first shot, illustrating this behavior in Itcl. Here you will find my the new branch with rewritten several memory facilities (preservation resp. release of classes and objects/variables) in Itcl. itcl/sebres-memopt-perf-branch This huge performance regression was initially introduced in [c7e729d15408ed4e] which was from begin basically inappropriate for this purposes. Just, because several routines (like Itcl_PreserveData/Itcl_ReleaseData) are public interface now, I was forced to rewrite internals only without great impact of the public API (I'm pretty sure I gets anyway several objections from TCT about). The first results are:
Further good news is the growth of time is stopped by growth of the variable number (due to list/hash-less implementation). So it was able to create 40K classes as well as over a 1M variables. As well as by your initial test code from here (as diff for before/after on my machine) it looks now:
Although I'm not completely ready here (still some little checks, find and fixes of other parallel regressions, review, documentation, convince of TCT etc), but you can check/try this if you want. Anyway I let this ticket open, because I've another performance-related branch where the Tcl_Preserve/Tcl_Release facilities are rewritten from me and don't have the regression (you observed) by growth of blocks number. simons added on 2019-01-30 00:04:41: I found an older discussion of the poor scalability of Tcl_Preserve here http://core.tcl.tk/itcl/tktview?name=04470f9067 (search for Tcl_Preserve) For Tcl 8.5, Itcl 3.4 was used. I guess this is very old. For Tcl 8.6, Itcl 4.1.2 was used. The slowing down is the critical issue for us as we have quite a lot of classes and variables defined in our system, and this exposes the poor scalability of Tcl_Preserve. sebres added on 2019-01-29 20:04:03: Hmm... Apart from the necessity the usage of Tcl_Preserve (still don't see why it should be in case of class creation), the results with threaded compiled Tcl on my side are still worse: Tcl8.5 (current): Initial 10 classes - 6084 microseconds per iteration Final 10 classes - 2069 microseconds per iteration Tcl8.6 (current): Initial 10 classes - 78647 microseconds per iteration Final 10 classes - 75752 microseconds per iteration What I don't see, is the slowing down in "Final" case (in opposite it is faster in both as "Initial"). Which version of itcl was used exactly? (for 8.5 and 8.6). |
