Check-in [ab89bbf879]
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.

 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 ... 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 ... 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 ... 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 ... 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594  optimization. In particular, LOAD-LOAD and STORE-LOAD redundancies can be eliminated by checking each moveFromCallFrame operation in turn for whether it is accessing an available value. If it is, it is eliminated and its result is replaced with the available value. ### Availability analysis: equivalence sets LOAD-STORE pairs can also be detected, by examining 'moveToCallFrame' instructions and eliminating them if the variable being written is available, and known to be equal to the Tcl value in the callframe. Therefore, in the availability analysis, as well as accumulating the sets of available Tcl values and the names in the callframe that correspond to them, we must track the sets of names in a given callframe that correspond correspond to each SSA value. That way, ................................................................................ Availability by itself is not sufficient for loop-invariant code motion. In addition, we need the concept of _anticipability_. A (_cf_, _name_) pair is _anticipable_ at a given point in the program if every execution path forward from that point contains a moveFromCallFrame retrieving that value prior to the value's being modified. Calculating anticipability requires multiple traversals of the program in postdominator order. Nothing is anticipable on 'return'. For blocks that do not return, we begin by taking the intersection of values anticipable at their successors: ![\texttt{ANTIC\_OUT}[B] = \bigcap_{P\in \texttt{succ}[B]} \texttt{ANTIC\_IN}[P] \right)](./anticout.svg) ................................................................................ predecessor), in dominator order. For each (_cf_, _name_) that is anticipable at the entry to the block, we check availability in the predecessor blocks. If the expression is available in at least one predecessor, but not all predecessors, we insert moveFromCallFrame instructions for it in the predecessors where it is unavailable. This is another analysis that must be iterated to convergence. Inserting these 'moveFromCallFrame' operations produces new available variables, which can in turn expose further opportunities for code motion. $VAND04$ discusses this in more detail in chapter 4. ### Possible combination of passes This analysis closely mirrors what happens in partial redundancy ................................................................................ doSomething cfout cfin moreArgs the input callframe will never again be used. ## TASK 4: Loop-invariant stores We have seen in Task 3 that'moveFromCallFrame' instructions can be placed speculatively to convert partial availability of a needed expression to full availability (and remove a 'moveFromCallFrame' later in the program on a 'busier' execution path). In a similar manner, 'moveToCallFrame' instructions can be placed speculatively at the head of blocks in order to make partially dead assignments ("faint" assignments in the terminology of $KNOO94$) fully dead and eliminate a 'moveToCallFrame' on an eariler path of the program. $LO98$ demonstrates how this partial liveness analysis is precisely dual to partial availability analysis, and introduces a dual form to Static Single Assignment (SSA), called Static Single Use (SSU). This latter form has similar advantages to SSA in that the dependency chain, here from definition to use rather than from use to definition, is factored so that there are explicit deviation points (dual to the merge ................................................................................ reason, the more computationally complex analysis of $KNOO94$ will be followed, but modified to operate on basic blocks rather than individual instructions. We begin with the liveness analysis of Task 2. We augment 'liveness' with the concept of partial liveness. The data flow equations will look like ![\textt{LIVE\_OUT}[B] = \begin{Bmatrix} \displaystyle \bigcap_{S \in \textt{succ}[B]} \textt{LIVE\_IN}[S],& S \text{ is not an exit block} \\ \displaystyle \left\{ v \in \textt{LOCALVARS}~|~v~\text{might alias a nonlocal variable} \right\},&\text{otherwise}\end{matrix}](liveout.svg) ![\textt{PLIVE\_OUT}[B] = \begin{Bmatrix} \displaystyle \bigcup_{S \in \textt{succ}[B]} \textt{PLIVE\_IN}[S],& S \text{ is not an exit block} \\ \displaystyle \left\{ v \in \textt{LOCALVARS}~|~v~\text{might alias a nonlocal variable} \right\},&\text{otherwise}\end{matrix}](pliveout.svg) ![\textt{LIVE\_IN}[B] = \textt{process}(B, \textt{LIVE\_OUT}[B])](livein.svg) ![\textt{PLIVE\_IN}[B] = \textt{process}(B, \textt{PLIVE\_OUT}[B])](plivein.svg) In these equations, _process_ represents the basic-block-level upward analysis from Task 2. The _LIVE_ and _PLIVE_ sets are sets of (callframe, variable) pairs, and it is assumed that the arguments to the intersection and union are translated if the callframe appears in a φ operation. Liveness corresponds to the 'availability' of the partial redundancy elimination. Just as a 'moveFromCallFrame' may not be hoisted over the point where it is available, a 'moveToCallFrame' may not be sunk past the point where it is live. Similarly, 'moveToCallFrame' creates values and corresponds to anticipability. Just as it is of no benefit to insert a 'moveFromCallFrame' for a value that is not anticipated, it is of no benefit to insert a 'moveToCallFrame' for a value that is already present. ## References $DREC93$ Karl-Heinz Drechsler and Manfred P. Stadel. "A variation of Knoop, Rüthing, and Steffen's _Lazy Code Motion_". _ACM SIGPLAN   | | | | | | | | | | | | | |  303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 ... 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 ... 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 ... 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 ... 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594  optimization. In particular, LOAD-LOAD and STORE-LOAD redundancies can be eliminated by checking each moveFromCallFrame operation in turn for whether it is accessing an available value. If it is, it is eliminated and its result is replaced with the available value. ### Availability analysis: equivalence sets LOAD-STORE pairs can also be detected, by examining moveToCallFrame instructions and eliminating them if the variable being written is available, and known to be equal to the Tcl value in the callframe. Therefore, in the availability analysis, as well as accumulating the sets of available Tcl values and the names in the callframe that correspond to them, we must track the sets of names in a given callframe that correspond correspond to each SSA value. That way, ................................................................................ Availability by itself is not sufficient for loop-invariant code motion. In addition, we need the concept of _anticipability_. A (_cf_, _name_) pair is _anticipable_ at a given point in the program if every execution path forward from that point contains a moveFromCallFrame retrieving that value prior to the value's being modified. Calculating anticipability requires multiple traversals of the program in postdominator order. Nothing is anticipable on return. For blocks that do not return, we begin by taking the intersection of values anticipable at their successors: ![\texttt{ANTIC\_OUT}[B] = \bigcap_{P\in \texttt{succ}[B]} \texttt{ANTIC\_IN}[P] \right)](./anticout.svg) ................................................................................ predecessor), in dominator order. For each (_cf_, _name_) that is anticipable at the entry to the block, we check availability in the predecessor blocks. If the expression is available in at least one predecessor, but not all predecessors, we insert moveFromCallFrame instructions for it in the predecessors where it is unavailable. This is another analysis that must be iterated to convergence. Inserting these moveFromCallFrame operations produces new available variables, which can in turn expose further opportunities for code motion. $VAND04$ discusses this in more detail in chapter 4. ### Possible combination of passes This analysis closely mirrors what happens in partial redundancy ................................................................................ doSomething cfout cfin moreArgs the input callframe will never again be used. ## TASK 4: Loop-invariant stores We have seen in Task 3 that moveFromCallFrame instructions can be placed speculatively to convert partial availability of a needed expression to full availability (and remove a moveFromCallFrame later in the program on a 'busier' execution path). In a similar manner, moveToCallFrame instructions can be placed speculatively at the head of blocks in order to make partially dead assignments ("faint" assignments in the terminology of $KNOO94$) fully dead and eliminate a moveToCallFrame on an eariler path of the program. $LO98$ demonstrates how this partial liveness analysis is precisely dual to partial availability analysis, and introduces a dual form to Static Single Assignment (SSA), called Static Single Use (SSU). This latter form has similar advantages to SSA in that the dependency chain, here from definition to use rather than from use to definition, is factored so that there are explicit deviation points (dual to the merge ................................................................................ reason, the more computationally complex analysis of $KNOO94$ will be followed, but modified to operate on basic blocks rather than individual instructions. We begin with the liveness analysis of Task 2. We augment 'liveness' with the concept of partial liveness. The data flow equations will look like ![\textt{LIVE\_OUT}[B] formula](liveout.svg) ![\textt{PLIVE\_OUT}[B] formula](pliveout.svg) ![\textt{LIVE\_IN}[B] = \textt{process}(B, \textt{LIVE\_OUT}[B])](livein.svg) ![\textt{PLIVE\_IN}[B] = \textt{process}(B, \textt{PLIVE\_OUT}[B])](plivein.svg) In these equations, _process_ represents the basic-block-level upward analysis from Task 2. The _LIVE_ and _PLIVE_ sets are sets of (callframe, variable) pairs, and it is assumed that the arguments to the intersection and union are translated if the callframe appears in a φ operation. Liveness corresponds to the 'availability' of the partial redundancy elimination. Just as a moveFromCallFrame may not be hoisted over the point where it is available, a moveToCallFrame may not be sunk past the point where it is live. Similarly, moveToCallFrame creates values and corresponds to anticipability. Just as it is of no benefit to insert a moveFromCallFrame for a value that is not anticipated, it is of no benefit to insert a moveToCallFrame for a value that is already present. ## References $DREC93$ Karl-Heinz Drechsler and Manfred P. Stadel. "A variation of Knoop, Rüthing, and Steffen's _Lazy Code Motion_". _ACM SIGPLAN