Itcl - the [incr Tcl] extension

View Ticket
Login
Ticket Hash: fe70356a54338a4d1122c9a70b000063b866b260
Title: Several performance optimizations (memory-preservation, on demand var-resolver, etc)
Status: Closed Type: Feature_Request
Severity: Important Priority: Immediate
Subsystem: Resolution: Fixed
Last Modified: 2019-10-16 16:19:23
5.58 years ago
Created: 2019-04-17 22:14:56
6.08 years ago
Version Found In: trunk
User Comments:
sebres added on 2019-04-17 22:14:56:

There are several branches of me in order to improve initialization and run-time of Itcl.

1. Branch sebres-memopt-perf-branch introduces new memory preservation mechanism for Itcl (as replacement for Tcl_Preserve), see also tcl-ticket [tcl#4731d31eff3e2245].

2. Branch sebres-on-dmnd-resolver-perf-branch is based on first branch and provides additionally a new variable resolver, which build several internal Itcl-structures for var-resolve on demand.

New resolver together with new memory preserving mechanism solve large performance regression related to growth of Itcl-internals (classes or definitions like variables) resp. liquidate the run-time dependency to the variables count of class and classes, especially classes within nested namespaces and class inheritances... Original itcl-resolver (due to completely rebuild of internals as well as memory-preserve) has the complexity ca. O(nn**2,2**vn) here, so the more classes, the deeper a class/inheritance and especially the more variables it has, the worse the performance of class creation or modification was. New branch has a related complexity ca. O(1), and a class creation a variable related complexity ca. O(vn) now.

Additionally this version is memory-saving because several internals (hash-entries, structures etc) are built on demand by first usage only (for example if some variable of inheritance is first time used in ancestor and only then, otherwise the memory remains unused).

The second branch contains also a performance test cases covering both improvements (see "tests-perf/itcl-basic.perf.tcl", expects newest 8.6 with timerate feature).

The comparison of results of the test-suite execution (trunk vs. sebres-memopt-perf-branch vs. sebres-on-dmnd-resolver-perf-branch) are attached.

The summaries of comparison of results trunk vs. sebres-on-dmnd-resolver-perf-branch are listed below:


==== initialization (preserve/release) ====

******************************************************************************** -Total 4 cases in 58.34 sec. (58.28 nett-sec.): -58335.090000 µs/# 3996 # 68.569 #/sec 58276.709 nett-ms +Total 4 cases in 1.28 sec. (1.27 nett-sec.): +1274.509000 µs/# 3996 # 3138.465 #/sec 1273.234 nett-ms Average: -14583.772500 µs/# 999 # 69 #/sec 14569.177 nett-ms +318.627250 µs/# 999 # 3138 #/sec 318.308 nett-ms Min: -4466.60 µs/# 999 # 223.88 #/sec 4462.131 nett-ms +143.973 µs/# 999 # 6945.7 #/sec 143.829 nett-ms Max: -29910.4 µs/# 999 # 33.433 #/sec 29880.451 nett-ms +500.756 µs/# 999 # 1997.0 #/sec 500.255 nett-ms ********************************************************************************

==== class/var creation ====

******************************************************************************** -Total 5 cases in 107.92 sec. (107.67 nett-sec.): -289147.624000 µs/# 11069 # 102.808 #/sec 107667.039 nett-ms +Total 5 cases in 0.40 sec. (0.40 nett-sec.): +119.592700 µs/# 30002 # 75761.287 #/sec 396.007 nett-ms Average: -57829.524800 µs/# 2213 # 103 #/sec 21533.408 nett-ms +23.918540 µs/# 6000 # 75757 #/sec 79.201 nett-ms Min: -746.324 µs/# 10000 # 1339.9 #/sec 7463.245 nett-ms +12.1779 µs/# 10000 # 82116.0 #/sec 121.779 nett-ms Max: -96940.1 µs/# 516 # 10.316 #/sec 50021.074 nett-ms +52.0000 µs/# 1 # 19230.8 #/sec 0.052 nett-ms ********************************************************************************

==== var access ====

******************************************************************************** -Total 39 cases in 2.23 sec. (0.85 nett-sec.): -84.529156 µs/# 389961 # 461379.283 #/sec 845.207 nett-ms +Total 39 cases in 2.22 sec. (0.81 nett-sec.): +80.869286 µs/# 389961 # 482259.724 #/sec 808.612 nett-ms Average: -2.167414 µs/# 9999 # 461379 #/sec 21.672 nett-ms +2.073571 µs/# 9999 # 482251 #/sec 20.734 nett-ms Min: -0.926193 µs/# 9999 # 1079689 #/sec 9.261 nett-ms +0.944594 µs/# 9999 # 1058655 #/sec 9.445 nett-ms Max: -4.874287 µs/# 9999 # 205158 #/sec 48.738 nett-ms +4.743174 µs/# 9999 # 210829 #/sec 47.427 nett-ms ********************************************************************************

==== object instance ====

... not affected here ...

Two first sections show huge performance increase using newest branch.

The section var access is provided in order to show there is basically no regression using this new building on-demand resolver by variable access.

The last section (object instance) in the attached results is not really affected (or fixed) by above-mentioned branches at the moment, so it is simply provided to show the large regression on object-initialization and deletion in current Itcl-version. I have already almost ready concept-solution for that, but it expects also modification on Tcl-OO and several Tcl-subsystems (internals), I'll try to provide it later (and then will update this RFE).


dgp added on 2019-08-27 20:40:50:
Somewhere we need to capture instructions on the proper use of these
memory routines. If we make them all public, those instructions can
just go into the documentation. If not, we'll need to record them somewhere
else easy to find, at least for Itcl devs ourselves.

For example, both Itcl_ReleaseData() and ItclCkFree() appear to have the
capability to deallocate ( via a call to ckfree() ) the memory under control.
So the programmer has a responsibility to either make enough calls to
Itcl_ReleaseData() or make a properly-timed call to ItclCkFree() but NOT both?
It's at least tricky.

The routines Itcl_PreserveData() and Itcl_ReleaseData() have already been
public, but now they carry a constraint that their ClientData argument must
have been allocated by a call to ItclCkAlloc(). This new requirement might
imply enough incompatibility that we need to bump to an Itcl 4.2 release.

ItclCkAlloc() is written to take a Tcl_FreeProc argument, but the routine has
five callers and every one of them passes NULL. If we don't need this argument,
I think it should go, and the code simplifications that follow from that.

sebres added on 2019-08-28 17:52:44:

To avoid memory leaks (and double free) each call of ItclCkalloc should be followed either with ItclCkfree (if no preservation expected) or with Itcl_ReleaseData. Where to release a block the calls of Itcl_PreserveData and Itcl_ReleaseData should be paired (see example 2).
The last call of Itcl_ReleaseData will invoke ItclCkfree or some callback specified as freeProc by ItclCkalloc or using Itcl_EventuallyFree (and then the free-proc should invoke ItclCkfree in the callback to free the block).

1. Simplest case (only allocated, no Itcl_PreserveData called):

Block *ptr = ItclCkalloc(sizeof(Block), NULL);
...
ItclCkfree(ptr);
2. Itcl_PreserveData was eventually called:
Block *ptr = ItclCkalloc(sizeof(Block), NULL);
Itcl_PreserveData(ptr); /* +1 = 1 */
...
Itcl_PreserveData(ptr); /* +1 = 2 */
...
Itcl_ReleaseData(ptr); /* -1 = 1 */
...
Itcl_ReleaseData(ptr); /* -1 = 0, after this call ptr is not accessible anymore */
3. Auto-preserve once (if freeProc specified in ItclCkalloc):
Block *ptr = ItclCkalloc(sizeof(Block), ItclCkfree); /* +1 = 1 */
...
Itcl_PreserveData(ptr); /* +1 = 2 */
...
Itcl_ReleaseData(ptr); /* -1 = 1 */
...
Itcl_ReleaseData(ptr); /* -1 = 0, after this call ptr is not accessible anymore */
4. Preserve with callback:
Block *ptr = ItclCkalloc(sizeof(Block), NULL);
...
Itcl_PreserveData(ptr); /* +1 = 1 */
Itcl_EventuallyFree(ptr, freeBlock);
...
Itcl_ReleaseData(ptr); /* -1 = 0, this calls freeBlock, so hereafter ptr is not accessible anymore */

static void freeBlock(ClientData block) { /* free data in block */ ... /* free block */ ItclCkfree(block); }


sebres added on 2019-10-15 14:09:10:

@dgp: as sebres-memopt-perf-branch is merged now - again thank you very much, Don (excellent work)!...

What do you think about other branch sebres-on-dmnd-resolver-perf-branch?
Would you still need some review or can I simply merge it as is (as I don't see any public interfaces affected or some regressions here)?

Although, I have still other things to rebase, similar lazy (on-demand) initializer for method's and proc's, but I can do it later (to hold it simpler).


dgp added on 2019-10-16 16:08:24:
Please merge the branch to trunk to include in Itcl 4.2.0.

sebres added on 2019-10-16 16:19:23:
done - integrated now, closing it.

Attachments: