Tcl Source Code

View Ticket
Login
Ticket UUID: e02798626dfbcd7b33db19e0fc3a9fb9fa0714f4
Title: clock command is 8 times slower with Tcl 9.0b1
Type: Bug Version: 8.7+
Submitter: anonymous Created on: 2024-02-24 18:36:48
Subsystem: 16. Commands A-H Assigned To: nobody
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2024-02-28 18:51:25
Resolution: Fixed Closed By: anonymous
    Closed on: 2024-02-28 18:51:25
Description:
tcl_platform(machine)  = x86_64
tcl_platform(os)  = Darwin
tcl_platform(osVersion) = 23.3.0

Here is my procedure :   

proc testclock {} {
    set base [clock scan {03/09/1968} -format {%d/%m/%Y}]

    for {set i 1} {$i < 20000} {incr i} {
        set now  [clock scan {+1 days} -base $base]
        set date [clock format [clock scan {+1 months} -base $now] -format {%d/%m/%Y}]
        set base $now
    }
}
testclock

time tclsh8.6.13 testclock.tcl 
> 1,09s user 0,01s system 99% cpu 1,105 total
time tclsh9.0 testclock.tcl 
> 7,92s user 0,02s system 99% cpu 7,977 total
User Comments: anonymous added on 2024-02-28 18:51:25:
+8 Thanks to all!

sebres added on 2024-02-28 10:03:56:

merged to 8.7/9.0, closed


jan.nijtmans added on 2024-02-27 23:19:35:

Fix in branch "fix-e02798626d--msgcat-regress" branch looks perfect to me. Many thanks!


sebres added on 2024-02-27 18:17:51:

> The shimmering between utf8 and UNICODE is quite new to me and is an important fact.

Well, it is rather not the shimmering of utf-8 to unicode (utf-8 is always there as objPtr->bytes). I'm even unsure it shimmers.
And I spoke about shimmering to unicode from something else (a typed object with command/ns/whatever in the internal representation gets lot, if it shimmers to string).
However here (in measurement of regexp vs. string equal) it is rather not the shimmering, more it is TEBC vs. direct command invocation with everything around that.

Anyway, since string match looks indeed better, I'd rewrite it so.


oehhar added on 2024-02-27 17:47:58:

Wow, I would never have thought this. The shimmering between utf8 and UNICODE is quite new to me and is an important fact. You learn each day, thank you so much.

Here are the results on my tcl 8.6.13:

% proc test {ns} { timerate { regexp -- {^::oo::} $ns } }; test ::tcl::clock
0.105722 µs/# 5368601 # 9458756 #/sec 567.580 net-ms
% proc test {ns} { timerate { string equal -length 6 {::oo::} $ns } }; test ::tcl::clock
0.642979 µs/# 1382123 # 1555260 #/sec 888.676 net-ms
% proc test {ns} { timerate { regexp -- {^::oo::} $ns } }; test ::oo::test
0.188651 µs/# 3714746 # 5300782 #/sec 700.792 net-ms
5 % proc test {ns} { timerate { string equal -length 6 {::oo::} $ns } }; test ::oo::test
0.659135 µs/# 1351934 # 1517139 #/sec 891.107 net-ms
% expr 0.642979/0.105722
6.081789977488129
% expr 0.659135/0.188651
3.4939385425998273

So factor 6 and 3.5 faster for regexp, incredibble. I also tried "string match {::oo::*}" which is close to regexp.

TIP 688 is started and may be improved : https://core.tcl-lang.org/tips/doc/trunk/tip/688.md

Thank you, you guys rock! Harald


sebres added on 2024-02-27 16:44:41:

As for string equal -length (vs regexp), it would be many times slower, because string equal -length (with option) didn't compile in TEBC (including everything belonging: reenter to TEBCResume, NRE-trampoline, etc), whereas regexp is completely compiled in TEBC.
(Let alone string equal with length would probably shimmer the string to unicode, what would introduce additional cost)

You can try it by yourself (I guess slowing down by factor 10):

timerate -calibrate {}
proc test {ns} { timerate { regexp -- {^::oo::} $ns } }; test ::tcl::clock
proc test {ns} { timerate { string equal -length 6 {::oo::} $ns } }; test ::tcl::clock
proc test {ns} { timerate { regexp -- {^::oo::} $ns } }; test ::oo::test
proc test {ns} { timerate { string equal -length 6 {::oo::} $ns } }; test ::oo::test


sebres added on 2024-02-27 16:31:59:

This is fixed now (see branch fix-e02798626d--msgcat-regress). There is no regression anymore (it looks even faster than 8.6 now)...

As already mentioned, it also fixes unintentional set of Class variable in uplevel frame.
Additionally I added -command to the namespace which, to avoid confusing self with a variable (since we're searching for command self here).

Although better would be to rewrite ::msgcat::PackageNamespaceGet completely, without regexp and without attempts to detect a namespace by self-invocation, if tclOO would implement one of this facilities:

- something like info object isa oo frame, to detect frame/ns belongs to tclOO
- something like info class namespace frame, to return class ns if frame/ns belongs to tclOO, otherwise empty string
- something like info frame-namespace, to return ns for the current frame regardless what current frame is (NS/tclOO/etc).

No idea, whether there is better way to retrieve it at tcl-level now. May be Donal (dkf) know?

Anyway, it works as expected now, tests passed too.

Please test and review. If the fix is OK, I'll merge it in 8.7/9.0.


oehhar added on 2024-02-27 16:22:40:

Thanks for commit [29eaa2a82fed26b7], great work! I did all this 7 years ago for René together with Donal Fellows...

May I just propose to replace

if {![regexp -- {^::oo::} $ns]} {

by

if {![string equal -length 6 {::oo::} $ns]} {

Take care, Harald


oehhar added on 2024-02-27 16:06:15:

That is great news, thanks a lot ! Harald


sebres added on 2024-02-27 15:46:23:

The issue also affects 8.7 (thus version updated).

And yes, it is definitely ::msgcat::PackageNamespaceGet (so something shimmers there, but it's not the catalog dict).
I fixed it now for msgcat (and also in clock to speed-up it a bit more)...
Will commit it soon as new branch.

BTW, previously ::msgcat::PackageNamespaceGet did change frame variable Class (because of uplevel 2 for whole proc-scope), or created it even... This will be fixed too in the new branch.


oehhar added on 2024-02-27 11:08:17:

Great Sergey, I really appreciate ! Take care, Harald


sebres added on 2024-02-27 10:43:22:

> This could be avoided using the new "mcn" command with a namespace as first argument.
> Might that be a path?

I don't think so: sure it'd speed-up a bit the mc for non-class NS (this evident +1µs/# illustrated below), but it doesn't explain +28µs/# in both blocks (as well as loss of this extra costs in other block with "EnterLocale" immediately after I replaced mc-call with set changeover 2299161), therefore my guess initially was, and still is, the shimmer.

I'm trying to find out what happens here if I get a free minute.


oehhar added on 2024-02-27 06:51:52:

I have scanned the msgcat code.

The main additional task is to check, if it is called by an oo class. See function proc ::msgcat::PackageNamespaceGet, which is now called for each mc command.

This could be avoided using the new "mcn" command with a namespace as first argument.

Might that be a path?

Thank you and take care, Harald


oehhar added on 2024-02-27 06:28:57:

Sergey, great analysis, thank you. I will double-check the msgcat module. There might be checks for a dict like "string is list and llength %2 == 0" which will shimmer to strings...

The new clock branch is also great, thank you. Any improvement is welcomed.

Take care, Harald


sebres added on 2024-02-26 22:28:04:

Back to the issue... Ultimately, I found what the problem is: it is not clock at all (or not directly clock).
Moreover, the fix using merge of clock-speedup will help only conditionally.
But I'll try to explain it bit by bit...

My profiler module doesn't work for 9.0 (it segfaults), so I wrote a small simple tcl-level profiler tool and injection for ::tcl::clock::add (since it has been showing the same issue) - GIST/sebres/4da0d76792111(v1)

How one can see it drastically increased times in entloc and chngov blocks.
Both of this blocks definitely uses msgcat, and if we'll measure chngov solely:

% timerate { namespace inscope ::tcl::clock { mc GREGORIAN_CHANGE_DATE } } -3.939602 µs/# 253833 # 253832 #/sec 1000.001 net-ms +2.451155 µs/# 407971 # 407971 #/sec 1000.000 net-ms

it'd show marginal deviation (it's surely slower than 8.6, but doesn't cost 30µs/# and basically 10 times faster than if it is executed within ::tcl::clock::add).

Because it also affect entloc block, which also uses msgcat, I guess the catalogue simply shimmers somewhere internally.

To confirm that, I modified the gist like here - GIST/sebres/4da0d76792111(v1).

And the result is as expected many times faster, and now the profiler blocks are sorted similar to 8.6 (init and calc are before both blocks entloc and chngov using mc).

My conclusion: this is rather an issue of msgcat (or probably the impossibility to use certain API of it from C-code without to shimmer). My guess - some catalog continuously shimmers between dict and something else (I have no time to take a closer look right now, but suspect this could be a TIP#490 or TIP#499 (or some mc-enhancement hereafter).

Thus, as already assumed, this worsening must be fixed regardless the clock-speedup merge (or at least documented like "don't use ... anymore"), because otherwise people using msgcat (in "improper" way) could experience similar regressions in 8.7 or 9.0.


sebres added on 2024-02-26 18:22:10:

Here is the first attempt to show all the differences...

The complete description is the RFE [ddc948cff9], see "Known small incompatibilities" section and in following comment (2018-05-29 18:21:07).

The main differences in tcl tests (change of precedence of some clock-formats by clock scan -format, e. g. %Y%m%d together with %Y%j, see [e7a722cd35]), was already discussed in Tcl core maillist (with Jan and Kevin), however I cannot find the mail anymore. The conclusion at that point was - the changes make sense and can be accepted.

There are also few new formats, for instance %EJ (calendar) and %Ej (astronomical) Julian day number (Julian date, JDN/JD) with time fraction (as floating point number), basically equivalent float form of datetime of sqlite3, see GH/tclclockmod/#16

The main enhancements of command are:

- clock configure with possibility to configure main default options per interpreter;
- new option -valid 1|0 and configure option GH/tclclockmod/#10
- clock accept new base value -now, as a replacement for [clock seconds]

By the way, periodically I rebased the fixes made for tclclockmod to clock-speedup branch, but the last rebase was 2020-06-23. Since this last rebase, tclclockmod implementation got several fixes (many of them also affecting tcl-core too), that are not back-ported yet to the tcl-core branch, for the full list see GH/tclclockmod/973aba1084b2274a..master.
(I can try to rebase it later, if my export-import-marks of git2fossil will be still usable)

IIRC, Flightaware uses also this variant of tclclockmod (possibly excepting few of the last commits) many years.


sebres added on 2024-02-26 15:29:16:

I already saw your mail and preparing an answer (must find old mail about changes affecting test cases, as well as designate all the corner changes).

Also the current tclclockmod implementation exhibit certain differences (by continued further enhancements).

Regarding the regression - it'd be good to know what exactly it is, anyway. Just because such huge effect may be caused by some other enhancement of 9.0, and in this case could affect also another code pieces in similar way. Sure, the clock engine would be "fixed" with merge of clock-speedup branch, but it'd be putting things too simply for whole 9.0 core, at least until one can't completely exclude further regressions by similar handling somewhere.
With other words, it'd be nice to have an explanation what is it exactly.


oehhar added on 2024-02-26 14:45:16:

Sergey, great analysis, thank you !

Ashok, Jan and myself discussed this within a telco today.

Ashok and Jan both searched for the reason and tested for example zip file system compared to physical file system, but with no findings.

The proposal is as follows:

  • The sergey clock speedup code is great as it implements most within C language
  • Some corner cases are handled differently. This is not a problem but more a documentation and test issue.
  • It is proposed to author a small TIP with the corner case change and to include the speedup branch in TCL 9.0 or 9.1.
  • Jan is apparently already working on porting the branch to TCL 9.

Any help apprciated. If you have a list of corner case changes, I may author a TIP.

Thank you, you guys rock ! Harald


sebres added on 2024-02-26 13:22:48:

Heavy regression seems to be only for clock scan, and also for relative time calculation only, here comparison 9.0 vs 8.6:

  % timerate { clock format 1600000000 -format {%d/%m/%Y} }
- 8.015277 µs/# 124762 # 124761 #/sec 1000.002 net-ms
+ 8.225095 µs/# 121580 # 121579 #/sec 1000.007 net-ms
  % timerate { clock scan {13/09/2020} }
- 9.767075 µs/# 102385 # 102384 #/sec 1000.002 net-ms
+ 9.409933 µs/# 106271 # 106270 #/sec 1000.003 net-ms
  # but:
  % timerate { clock scan {+1 months} }
- 91.2549 µs/# 10959 # 10958.3 #/sec 1000.062 net-ms
+ 29.5314 µs/# 33863 # 33862.3 #/sec 1000.022 net-ms
  % timerate { clock scan {+1 day} }
- 90.3502 µs/# 11069 # 11068.0 #/sec 1000.086 net-ms
+ 30.9479 µs/# 32313 # 32312.4 #/sec 1000.019 net-ms
  # however:
  % timerate { clock scan {1 year 1 month 1 day 1 hour 1 minute 1 second} }
- 92.3471 µs/# 10829 # 10828.7 #/sec 1000.027 net-ms
+ 31.9452 µs/# 31304 # 31303.6 #/sec 1000.013 net-ms

The second last illustrates that this occurs not in yacc/bison parser or lex transformation, but rather by internal calculation somewhere around relative time adjustment.

This confirms that too:

  % timerate { clock add [clock seconds] 1 year 1 month 1 day 1 hour 1 minute 1 second }
- 82.0134 µs/# 12194 # 12193.1 #/sec 1000.072 net-ms
+ 26.4807 µs/# 37764 # 37763.4 #/sec 1000.017 net-ms

And yes, tclclockmod seems to be not affected (works for 9.0 with exactly the same speed as for 8.6/8.7).


oehhar added on 2024-02-25 18:49:10:

Yes, that is quite sad. And we have the patch by Sergey which accellerates the clock command by a factor of 20. And it was never merged...

Any work here appreciated:

Here are the branches:

  • [sebres-8-7-clock-speedup-cr2]
  • [sebres-8-6-clock-speedup-cr2-with-original-testcases]

But we are happy, if we get 9.0 in a certain suboptimal way out of the door.

Thank you and take care, Harald