Tcl Source Code

View Ticket
Login
Ticket UUID: 28cc67a606a7ab771af4c7b542b34ec1a377edc7
Title: array set from dict: set values multi times on the same array is slower with Tcl9.0.
Type: Bug Version: 9.0
Submitter: anonymous Created on: 2024-04-15 15:59:55
Subsystem: 16. Commands A-H Assigned To: griffin
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2024-04-25 19:55:10
Resolution: Fixed Closed By: griffin
    Closed on: 2024-04-25 19:55:10
Description:
proc SimplePERM {} {
    set permmod [dict create]
    for {set i 0} {$i < 512} {incr i} {
        dict set permmod $i [expr {$i + int(rand() * (256 - $i))}]
        
    }
    return $permmod
}

proc generateData {} {
    array set val0 {} ; array set val1 {} ; array set val2 {}
    for {set i 0} {$i <= 20} {incr i} {
        for {set j 0} {$j <= 20} {incr j} {
            for {set k 0} {$k <= 20} {incr k} {
                set v [SimplePERM]
                array set val0 $v
                array set val1 $v
                array set val2 $v
            }
        }
    }
}

generateData

time tclsh9.0b2 setarray.tcl
> 8,23s user 0,02s system 99% cpu 8,281 total
time tclsh9.0b1 setarray.tcl
> 7,93s user 0,02s system 99% cpu 7,962 total
time tclsh8.6.14 setarray.tcl
> 2,71s user 0,01s system 99% cpu 2,726 total
User Comments: griffin added on 2024-04-25 19:55:10:

Removed abstract list extension from dict objtype. The implementation is far from optimal.

Handling non-pure dict values will require a complex solution.


sebres added on 2024-04-23 16:49:09:

in cases where only the internal representation exists (when we can guarantee that the list length is exactly twice the dictionary size)

Well, this is even one of the additional issues - once the string representation created (for whatever reason), it is not possible anymore to know whether dict is "pure" dict or not. The lists have a "canonical" flag, so the same must be implemented for dict (it is canonical unless obtained from string representation), so TclIsPureDict could be replaced by TclIsCanonicalDict and in that case it is no matter whether string representation is there or not.


dkf added on 2024-04-23 15:16:33:

I was glancing at this earlier, and wondered why we were always generating the string representation of the dictionary, even in cases where only the internal representation exists (when we can guarantee that the list length is exactly twice the dictionary size).


oehhar added on 2024-04-22 07:08:12:

Thanks, Brian, for looking into this, I appreciate. Your personal message indicates that the issue is easy to fix but requires further investigation. Thanks for all the constant work. It is often sad (but normal procedure) that a great achevement gets tainted by small side-issues. But that is normal, thank you all for making TCL even better ! Harald


griffin added on 2024-04-22 01:58:41:
Removing abstractlist extension from dict objtype. This implementation was naive  and dict is probably already optimized w.r.t. list operations.
Further analysis is needed to determine if any better approaches are possible.

Changes on branch [bug28cc67a606a7] for review.

oehhar added on 2024-04-17 18:51:45:

Hi Sergey,

thank you for finding the issue.

I think, the sense of the beta phase is just to find those issues. We always have the possibility to not merge and nobody will try, or by merging and get feed-back. I prefer to merge.

I think, the abstract list is a great feature. And it is sure, that it is not perfect. Lets work on it and give Brian and others some time to comment.

Thank you again, Harald


sebres added on 2024-04-16 22:17:45:

The assumption that the regression is related to abstract list (type enhancement) is confirmed: [f525e7ad92ff63c8] doesn't show it at all (neither by iteration over dict, nor by llength or array set). Sure the dict will shimmer to list there, but only once (where in current trunk due to internal duplication by iteration, new dict object will shimmer to list every time). Still worse it'd do that over the freshly created string-representation, where previously it did that over dict internals (for pure-dicts).

I have also a small question, by the way: since when exactly half-finished solutions (with large effect or such affecting almost everything) get merged now in main branch? If I take a closer look at that implementation (not particularly dict, but complete type-extention of TIP 636) it looks not really well thought out and rather as some quick (and even more or less dirty) partly-backed solution to me.
And is there really the intention to release this so? I mean once it gets released, to change TCL_OBJTYPE_V1 or TCL_OBJTYPE_V2 would it need 9.1 (and new TIPs), probably? Or am I wrong?


sebres added on 2024-04-16 20:27:31:

The case is also present if SimplePERM proc returns an array.

It doesn't "return an array", because it is currently impossible in tcl - [array get] returns a dict:

   % tcl::unsupported::representation [array get env]
   value is a dict with a refcount of 1, ..., no string representation

Tcl cannot (and probably never will) directly return an array - array is just a complex variable (hash table in var in scope/NS/whatever), so [array get] generates a dict at the time of execution to be able to return it (basically for the same reason why one cannot return directly $env (that's as implemented in core-tcl, there are another tcl-engines which able to do that and even such where array and dict are the same stuff).


anonymous added on 2024-04-16 17:45:33:
The case is also present if SimplePERM proc returns an array.
array set permmod {}
for {set i 0} {$i < 512} {incr i} {
    set permmod($i) 1        
}
return [array get permmod]

anonymous added on 2024-04-16 17:33:28:
Indeed, your analysis is much more advanced than mine. 
Which made me think that it was the array command that was the problem, was that if I created one array for each iteration, the time was identical between tcl9 and tcl8.6.14.

sebres added on 2024-04-16 17:22:57:

This definitely belongs to abstract-lists implementation of 9.0 and although it was supposed to be better (avoid shimmer of dict to list), in the reality it is rather worse now:

  • list facilities previously handles pure-dicts better (even if it shimmered sometimes between list/dict).
  • by retrieving of length (DictAsListLength) now it always obtain string representation of dict (also for pure-dict object), what:
    • may be slow and
    • causes that dict object is not pure anymore (basically we need a flag for that, to mark pure-dicts independently to existence of its string representation)
  • DictAsListLength (but basically ObjType::LengthProc) cannot handle errors, what may be not so good for a possible fallback (if dict is not pure dict);
  • retrieving of index (if one would implement it for dict), e. g. for INST_FOREACH_STEP/INST_FOREACH_START instructions, is hardly possible, because normal iteration over the dict needs some interim Tcl_DictSearch (or similar iterator handle), and without that we'll get O(n!) aka factorial big-O if it'd always start from scratch and stop at index.
  • ObjType::GetElements may be too hard (for large dicts) and it is unused, and rightly so, for INST_FOREACH_STEP/INST_FOREACH_FIRST instructions.
  • there is no ObjType::Iterate facility.

All that makes difficult to implement it with reasonable performance for all corner cases. However I'll take a closer look to this later.


sebres added on 2024-04-16 14:51:37:

It is only affected if the value is set from dict. Basically as already correct assumed, it is not array set - it is foreach over dict content (foreach_start/foreach_step instructions compiled internally) that cause performance regression there.

Here is an example without array set, that shows the same regression with 9.0 (also llength $v does it):

proc SimplePERM {{asdict 1}} {
    if {$asdict} {
        set permmod [dict create]
        for {set i 0} {$i < 512} {incr i} {
            dict set permmod $i 1       
        }
    } else {
        set permmod {}
        for {set i 0} {$i < 512} {incr i} {
            lappend permmod $i 1
        }
    }
    return $permmod
}

proc generateData {{n 10000} {asdict 1}} {
    array set val {}
    set v [SimplePERM $asdict]
    for {set i 0} {$i < $n} {incr i} {
        # array set val $v
        # llength $v
        foreach {en ev} $v {}
    }
}

generateData 2000; # warm-up
puts [timerate {generateData 1000 1} 10000 1]
puts [timerate {generateData 2000 1} 10000 1]

generateData 2000 0; # warm-up
puts [timerate {generateData 1000 0} 10000 1]
puts [timerate {generateData 2000 0} 10000 1]

sebres added on 2024-04-16 14:02:24:

This is not platform-related, I see this on both windows and linux pretty well:

- 3263936 µs/# 1 # 0.306 #/sec 3263.936 net-ms - 8.6.14 (core-8-6-branch)
+ 9344194 µs/# 1 # 0.107 #/sec 9344.194 net-ms - 9.0    (trunk)

Trying this on both versions:

timerate -calibrate {}

proc SimplePERM {} {
    set permmod [dict create]
    for {set i 0} {$i < 512} {incr i} {
        dict set permmod $i [expr {$i + int(rand() * (256 - $i))}]
    }
    return $permmod
}

proc test {} {
    array set val0 {} ; array set val1 {} ; array set val2 {}
    puts PERM:\t[timerate {
       set v [SimplePERM]
    } 10000 9261]
    puts ASET:\t[timerate {
       array set val0 $v
       array set val1 $v
       array set val2 $v
    } 10000 9261]
    array set val0 {} ; array set val1 {} ; array set val2 {}
    puts BOTH:\t[timerate {
       set v [SimplePERM]
       array set val0 $v
       array set val1 $v
       array set val2 $v
    } 10000 9261]
    array set val0 {} ; array set val1 {} ; array set val2 {}
    puts CYCL:\t[timerate {
       for {set i 0} {$i <= 20} {incr i} {
          for {set j 0} {$j <= 20} {incr j} {
              for {set k 0} {$k <= 20} {incr k} {
              }
          }
       }
    } 10000 1]
}

test

doesn't explain that 3x difference in the "granulated" times (it is slower but few percent only, e. g. array set shows in 9.0 ca. 5% degradation only, and all others are similar, if not even less). The same happens if I put timerate into generateData to measure whole code once.

However if timerate is outside of original generateData (measures single or multiple invocations of generateData), there is indeed 3x difference between 8.6 and 9.0 (8.7 seems to be not affected).
It looks like something with compilation cycles or so, or inside the TEBCResume (NRE, whatever), because disassemble of both procs shows the same code. It happens also if I unfold 3 nested cycles to single one (with 10k iterations) or rewrite it to foreach i [lseq 10000].

Here is the original example rewrote as much simpler version with 10x performance degradation (however still O(n) just 10x larger), which also illustrates difference with inlined in proc timerate (without degradation):

# timerate -calibrate {}

proc SimplePERM {} {
    set permmod [dict create]
    for {set i 0} {$i < 512} {incr i} {
        dict set permmod $i 1       
    }
    return $permmod
}

proc generateData {{n 10000}} {
    array set val {}
    set v [SimplePERM]
    for {set i 0} {$i < $n} {incr i} {
        array set val $v
    }
}

generateData 20000; # warm-up
puts [timerate {generateData 10000} 10000 1]
puts [timerate {generateData 20000} 10000 1]

# to illustrate strange behaviour if timerate compiles the body instead of proc itself:
puts "# timerate inside proc:"
proc generateData {{n 10000}} "puts \[timerate {\n[info body generateData]\n} 10000 1\]"
generateData 10000
generateData 20000

Result as diff between 9.0 (bad -) and 8.6/8.7 (good +):

- 2302136 µs/# 1 # 0.434 #/sec 2302.136 net-ms
+ 200457.0 µs/# 1 # 4.989 #/sec 200.457 net-ms
- 4640549 µs/# 1 # 0.215 #/sec 4640.549 net-ms
+ 475257.0 µs/# 1 # 2.104 #/sec 475.257 net-ms
# timerate inside proc:
- 157615.0 µs/# 1 # 6.345 #/sec 157.615 net-ms
+ 136238.0 µs/# 1 # 7.340 #/sec 136.238 net-ms
- 320462.0 µs/# 1 # 3.120 #/sec 320.462 net-ms
+ 286145.0 µs/# 1 # 3.495 #/sec 286.145 net-ms

Anyway I don't think it is array set command alone.


anonymous added on 2024-04-15 16:29:09:
Harald, 
Thank you for testing, I too am surprised to see so much difference. 
My platform is macOS. 

Maybe someone else can test on the same OS (if that's where the issue comes from).

oehhar added on 2024-04-15 16:10:16:
Nicolas,
thanks for the ticket.
I have tried your command with wish on windows with timerate "cmd". Here are the results:

TCL 8.6.14

% timerate generateData
7586957 µs/# 1 # 0.132 #/sec 7586.957 net-ms

TCL 9.0b2rc0

% timerate generateData
9272919 µs/# 1 # 0.108 #/sec 9272.919 net-ms

So, the difference is not so huge.

Take care,
Harald