Tcl Source Code

View Ticket
EuroTcl/OpenACS 11 - 12 JULY 2024, VIENNA
Ticket UUID: ddc948cff9781daac7ad95a3077b3c1b4f07cf93
Title: Tcl-clock speed-up / flightaware Tcl-bounties#4
Type: RFE Version: tcl8.6, tcl8.7
Submitter: sebres Created on: 2017-01-10 23:40:41
Subsystem: 16. Commands A-H Assigned To: nobody
Priority: 5 Medium Severity: Important
Status: Pending Last Modified: 2018-05-29 18:21:07
Resolution: Fixed Closed By: nobody
    Closed on: 2017-05-29 21:19:17
Description: (text/html)
 <p>This is an artificial PR (RFE), corresponds with FlightAware's Tcl-bounties program, issue<br>
<a href="" class="issue-link js-issue-link" data-id="190310797" data-error-text="Failed to load issue title" data-permission-text="Issue title is private" title="Intent to work on speedups to `clock format` and `clock scan`">flightaware/Tcl-bounties#4</a> (<a href="">previous PR on github</a>).</p>
Back-ported to 8.6 now, see branch "<a href="timeline?r=sebres-8-6-clock-speedup">sebres-8-6-clock-speedup</a>".
Corresponds to branch "<a href="timeline?r=sebres-trunk-clock-speedup">sebres-trunk-clock-speedup</a>" currently based on "sebres-trunk-timerate" (used for performance measurement).
<p><b>Current performance increase:</b></p>
<li><code>clock format</code> - 10-15 times faster;</li>
<li><code>clock scan -format</code> - 30-60 times (some previously extremely slow scans up to 100 times faster);</li>
<li><code>clock scan</code> (freescan) - 15-20 times;</li>
<li><code>clock add</code> - 30-90 times;</li>
<p>My current result of performance test-cases attached bellow, (running threaded, warmed up, 4 x threads working, 4 core i5-2400 @ 3.10 GHz).</p>
<p>normal system-load (without parasitic load):<br>
<a href="">clock-test-perf-new.txt</a><br>
<a href="">clock-test-perf-org.txt</a><br>
Diff: <a href="">clock-test-perf.diff.txt</a></p>
<p>with parasitic load:<br>
<a href="">clock-test-perf-mth4x-new.txt</a><br>
<a href="">clock-test-perf-mth4x-org.txt</a><br>
Diff: <a href="">clock-test-perf-mth4x.diff.txt</a></p>
<p>Here is an excerpt from diff of total-blocks (normal system-load):</p>
<div class="highlight highlight-source-diff"><pre> % # Format
 Total 32 cases in 16.00 sec.:
<span class="pl-md"><span class="pl-md">-</span>371.620594 µs/# 1736699 # 109263.467 #/sec 15894.599 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>32.574084 µs/# 19007011 # 1277997.901 #/sec 14872.490 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>11.613144 µs/# 54271 # 109262 #/sec 496.706 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>1.017940 µs/# 593969 # 1277999 #/sec 464.765 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>5.707173 µs/# 86686 # 175218 #/sec 494.732 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>0.574267 µs/# 789155 # 1741349 #/sec 453.186 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>47.6873 µs/# 10472 # 20970.0 #/sec 499.381 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>4.671148 µs/# 105698 # 214080 #/sec 493.731 nett-ms</span>
 % # Scan

 Total 35 cases in 17.50 sec.:
<span class="pl-md"><span class="pl-md">-</span>1831.621900 µs/# 413807 # 23678.910 #/sec 17475.762 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>31.400022 µs/# 19301079 # 1180129.840 #/sec 16355.047 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>52.332054 µs/# 11823 # 23679 #/sec 499.307 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>0.897143 µs/# 551459 # 1180129 #/sec 467.287 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>22.5439 µs/# 22120 # 44358.0 #/sec 498.670 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>0.658980 µs/# 696086 # 1517495 #/sec 458.707 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>125.380 µs/# 3986 # 7975.7 #/sec 499.765 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>1.916346 µs/# 253079 # 521826 #/sec 484.987 nett-ms</span>

 % # FreeScan
 Total 17 cases in 8.50 sec.:
<span class="pl-md"><span class="pl-md">-</span>684.758500 µs/# 321446 # 37903.163 #/sec 8480.717 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>32.940470 µs/# 5976102 # 733668.512 #/sec 8145.507 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>40.279912 µs/# 18908 # 37902 #/sec 498.866 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>1.937675 µs/# 351535 # 733668 #/sec 479.147 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>16.4479 µs/# 30288 # 60798.2 #/sec 498.173 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>0.791828 µs/# 587441 # 1262901 #/sec 465.152 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>99.6852 µs/# 5013 # 10031.6 #/sec 499.722 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>8.778852 µs/# 56573 # 113910 #/sec 496.646 nett-ms</span>

 % # Add 

 Total 14 cases in 7.00 sec.:
<span class="pl-md"><span class="pl-md">-</span>546.773300 µs/# 241810 # 34603.012 #/sec 6988.120 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>9.650766 µs/# 9665868 # 1504157.966 #/sec 6426.099 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>39.055236 µs/# 17272 # 34603 #/sec 499.151 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>0.689340 µs/# 690419 # 1504158 #/sec 459.007 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>15.3289 µs/# 32490 # 65236.4 #/sec 498.035 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>0.554461 µs/# 814551 # 1803552 #/sec 451.637 nett-ms</span>
<span class="pl-md"><span class="pl-md">-</span>76.5505 µs/# 6527 # 13063.3 #/sec 499.645 nett-ms</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>0.899414 µs/# 521492 # 1111835 #/sec 469.037 nett-ms</span>

<p>Additionally it is few CPU and memory hungry.<br>
For example after executing of following script tcl consumes in delta 4MB fewer memory per execution (because of memory fragmentation in original trunk, that goes to tcl internal memory manager):</p>
<pre><code>time {set v([incr i]) [clock format $i -format "%Y-%m-%dT%H:%M:%S %z" -gmt 1]} 100000
<p>Because many rules are system-wide (across interpreter), it spares resources also, e. g. don't wash out the cpu-cache.</p>

<p><b>Known small incompatibilities:</b></p>
<li>confusing precedence of some clock-formats by <code>clock scan -format</code>, see <a href=""></a>;</li>
<li>because of bug-fix, the free scanning by relative date with ordinal month and relative date with ordinal month and relative weekday deliver another results now (previously was not test-covered):</li>
<div class="highlight highlight-source-diff"><pre> % # FreeScan : relative date with ordinal month (I said January)
 % clock scan "5 years 18 months 385 days next 1 January" -base 0 -gmt 1
<span class="pl-md"><span class="pl-md">-</span>Fri Jul 21 02:00:00 CEST 1978</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>Sat Jan 21 01:00:00 CET 1978</span>
 % # FreeScan : relative date with ordinal month and relative weekday (I said Fri in January)
 % clock scan "5 years 18 months 385 days next January Fri" -base 0 -gmt 1
<span class="pl-md"><span class="pl-md">-</span>Sat Jul 22 02:00:00 CEST 1978</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>Fri Jan 27 01:00:00 CET 1978</span></pre></div>
<li>I've introduced an optional tokens, for example zone (<code>%z</code> resp. <code>%Z</code>) is optional now (can be switched off);</li>
<li>several bugs in <code>clock format</code>, as well as in <code>clock scan</code> was fixed, thus some new test-cases created, that may fail in original branch (see <a href="fdiff?sbs=1&v1=629251fbb3dcecfa&v2=71bd2e9833782df9">629251fbb3dcecfa..71bd2e9833782df9</a>);</li>
<li>additionally scan/format token <code>%Es</code> introduced to parse or format local seconds (in opposition to <code>%s</code> for posix seconds);</li>
<li>value <code>-now</code> will be accepted as clock value for format or add functions, e. g. <code>clock format -now -f %u</code>;</li>
<li>own extensions made in <code>ParseClockFormatFormat</code>, <code>ParseClockScanFormat</code> or <code>DateParseActions</code> are no more effective (because the scanning and formatting is pure C-implementation now). I plan to provide a functionality for extension with own scanning/parsing rules.</li>
<p>Side effects:</p>
<li>because new engine does not used regexp (and all lexer rules work in utf-8) the scanning object will be not converted in string (unicode), so leaves its current internal representation untouched.</li>
User Comments: sebres added on 2018-05-29 18:21:07: (text/x-fossil-wiki)
I've rebased (back-ported) my current state of <a href="">tclclockmod</a> back to tcl.core in branch "<a href="">sebres-8-6-clock-speedup-cr2</a>".

In comparision to sebres-8-6-clock-speedup-cr1 (was at some point removed from core-8-6-branch):<ul>
<li> several bugs fixed (+ minor optimizations);</li>
<li> Flightaware DST fix applied, see [40ff672959] (this bug is still present in core branches);</li>
<li> Recognize positive time zone offset like "31 Jan 14 23:59:59 +0100", closes [5019748c73];</li>
<li> validation rules introduced, closes [3475995] (option "-valid 1|0" with possibility of setting the default configuration, more in PR <a href="">github/sebres/tclclockmod#10</a>);</li>

sebres added on 2017-05-29 21:19:17:
merged now in 8.6/trunk

sebres added on 2017-05-29 11:48:39:
So the date 1495810800 is past.

If no objections, I would like to merge the subject into 8.6/trunk on 1496088000 / Mon May 29 20:00:00 GMT 2017.


sebres added on 2017-05-15 19:07:22:
Branch "sebres-clock-speedup" (trunk version) moved to "sebres-trunk-clock-speedup", because of the troubles to access on fresh-cloned fossil repository.

Branch "sebres-8-6-clock-speedup" was not affected from this problem.

sebres added on 2017-05-11 12:57:40:
As regards the last commit [dea486a4618db6b9] (guaranties that the clock-stubs are loaded on demand only), following comparison shows why this is necessary:

% timerate {exec tclsh86t << "clock seconds; exit"}; #don't load stubs
32281.4 µs/# 31 # 30.978 #/sec 1000.722 nett-ms

% timerate {exec tclsh86t << "clock format -now; exit"}; #loading stubs
73673.4 µs/# 14 # 13.573 #/sec 1031.428 nett-ms

BTW this new possibility of the namespace-based auto-loading (auto_index_ns) was cherry-picked from my another feature-branch, where I use it already long enough. 
For example I rewrote (to auto_index_ns) all the tclIndex-files on very-very large project of me, so the auto-loading is significant faster within its environment (additionally I've across-threaded implementation of auto_index (tsv), if someone needs please contact me).

sebres added on 2017-05-10 12:43:24:
Back-ported to 8.6 now, see branch "sebres-8-6-clock-speedup".

BTW In thin version I've merged Kenny's compiled solution for clock ensemble with part of my feature-performance branch, that optimized basically all ensembles (comming soon), but here "clock" only (all sub-commands are compiled now).

WiP: documentation, code review (+ still one known bug in integration with msgcat-dictionary).

If I get it ready (I hope still today) I will create 2 back-merge points for trunk (re-merge back into branches "sebres-trunk-timerate" and "sebres-clock-speedup").

Because all sub-commands compiled now (no ensemble overhead anymore for "clock ..."), the performance increase compared to original clock grows now up to 50 - 100 times.