Check-in [2f51c8d73b]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:More about code motion
Timelines: family | ancestors | descendants | both | kbk-refactor-callframe
Files: files | file ages | folders
SHA3-256: 2f51c8d73bce7855797f67aa11f7a5dce43a225d74457bcb89b6ebb115ce894e
User & Date: kbk 2019-02-25 04:41:47.424
Context
2019-03-26
20:45
Oops - didn't commit images for callframe.md! check-in: a19796b4e5 user: kbk tags: kbk-refactor-callframe
2019-02-25
04:41
More about code motion check-in: 2f51c8d73b user: kbk tags: kbk-refactor-callframe
2019-02-18
04:25
Write up first couple of tasks check-in: 643eefa7e6 user: kbk tags: kbk-refactor-callframe
Changes
Unified Diff Ignore Whitespace Patch
Changes to doc/20190216callframe/callframe.md.
236
237
238
239
240
241
242



243
244
245
246
247
248
249
    the instruction, and all other names that potentially alias the
	given name lose their values.
	
+ `invoke[Expanded]` - The available values in the source callframe
  become available in the destination callframe.  Then, any names that the
  invoked command might change and any names that might alias them,
  lose their available values.




+ `directSet` etc. - Any name that might be an alias of the changing
  variable loses its available value. (If the changing variable name
  cannot be determined, all names might be aliases, because in this
  case we cannot determine that the name is fully qualified.)
  
+ `variable`, `upvar`, `nsupvar` - Any existing value for the given







>
>
>







236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
    the instruction, and all other names that potentially alias the
	given name lose their values.
	
+ `invoke[Expanded]` - The available values in the source callframe
  become available in the destination callframe.  Then, any names that the
  invoked command might change and any names that might alias them,
  lose their available values.
  
+ `extractCallframe` - The available values in the source callframe
  become available in the destination callframe.

+ `directSet` etc. - Any name that might be an alias of the changing
  variable loses its available value. (If the changing variable name
  cannot be determined, all names might be aliases, because in this
  case we cannot determine that the name is fully qualified.)
  
+ `variable`, `upvar`, `nsupvar` - Any existing value for the given
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
block's new available set. The calculation of this set may in turn
change the available values on φ instructions at the start of loops,
so the calculation must be applied to convergence over all blocks in
the program.

Those who are versed in formal data flow analysis are welcome to
separate the descriptions of the operations given above into separate
sets `AVAIL_GEN` and `AVAIL_KILL` so that the problem can be given as
the forward analysis:

![\texttt{AVAIL\_OUT}[B] = \texttt{AVAIL\_GEN}[B] 
~ \cup ~ \left( 
\overline{\texttt{AVAIL\_KILL}[B]} 
~ \cap \bigcap_{P\in \texttt{pred}[B]} \texttt{AVAIL\_OUT}[P] 
\right) 







|







279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
block's new available set. The calculation of this set may in turn
change the available values on φ instructions at the start of loops,
so the calculation must be applied to convergence over all blocks in
the program.

Those who are versed in formal data flow analysis are welcome to
separate the descriptions of the operations given above into separate
sets `AVAIL\_GEN` and `AVAIL\_KILL` so that the problem can be given as
the forward analysis:

![\texttt{AVAIL\_OUT}[B] = \texttt{AVAIL\_GEN}[B] 
~ \cup ~ \left( 
\overline{\texttt{AVAIL\_KILL}[B]} 
~ \cap \bigcap_{P\in \texttt{pred}[B]} \texttt{AVAIL\_OUT}[P] 
\right) 
358
359
360
361
362
363
364



365
366
367
368
369
370
371
  input callframe. We start with the set required relative to the
  output callframe. All variables that may be read or written, and
  all possible aliases, are added to the set. (Since we do not in
  general examine whether a procedure must alter a given variable,
  but only whether it may alter it, we require in general that the
  previous value of the variable must already be in sync.)
  



+ `directGet` etc. - The variables required will be relative to the
  input callframe. We start with the set required relative to the
  output callframe. Any variable that might alias the variable being
  retrieved or set in the instruction is added to the set.
  
+ `variable`, `upvar`, `nsupvar` - The variables required will be
  relative to the input callframe. We start with the set required







>
>
>







361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
  input callframe. We start with the set required relative to the
  output callframe. All variables that may be read or written, and
  all possible aliases, are added to the set. (Since we do not in
  general examine whether a procedure must alter a given variable,
  but only whether it may alter it, we require in general that the
  previous value of the variable must already be in sync.)
  
+ `extractCallframe` - The variables required in the source callframe
  are precisely those anticipated in the destination callframe.

+ `directGet` etc. - The variables required will be relative to the
  input callframe. We start with the set required relative to the
  output callframe. Any variable that might alias the variable being
  retrieved or set in the instruction is added to the set.
  
+ `variable`, `upvar`, `nsupvar` - The variables required will be
  relative to the input callframe. We start with the set required
388
389
390
391
392
393
394






395

396

























































































































397
398
399
400
401
402






403
404
405

406






designates a variable that is not live in the given callframe may be
safely removed. Either it is a dead STORE (that is, neither it nor
any potential alias will ever be accessed again), or it participates
in a STORE-STORE pair. 

## TASK 3: Anticipability analysis and loop-invariant loads



































































































































## TASK 4: Loop-invariant stores



## References







\[SIMP96\]. 
Loren Taylor Simpson. 1996. Value-Driven Redundancy
Elimination. Ph.D. Dissertation. Rice University, Houston, TX,

USA. AAI9631092. [<https://www.clear.rice.edu/comp512/Lectures/Papers/SimpsonThesis.pdf>]













>
>
>
>
>
>

>

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>






>
>
>
>
>
>
|
|
|
>
|
>
>
>
>
>
>
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
designates a variable that is not live in the given callframe may be
safely removed. Either it is a dead STORE (that is, neither it nor
any potential alias will ever be accessed again), or it participates
in a STORE-STORE pair. 

## TASK 3: Anticipability analysis and loop-invariant loads

With what has been developed in Tasks 1 and 2, we can get rid of
locally redundant `moveToCallFrame` and `moveFromCallFrame`
instructions. What remains is to attack loop-invariant ones: that is,
`moveFromCallFrame` that can be safely moved above a loop or
`moveToCallFrame` that can be safely be moved below. We use a plan of
attack based loosely on \[DREC93\].

### Anticipability

We already have the concept of _availability,_ developed in
Task 1. Let us introduce some notation for available values.
`AVAIL_OUT[B]` will be notation for the set of ordered pairs
(_cf_. _name_) available in SSA registers on exit from block `B` of
the program. (In the pairs, _cf_ names an SSA variable designating a
callframe, and _name_ is the name of a variable in that frame.
Similarly, `AVAIL_IN[B]`will be the set of (_cf_, _name_) pairs
available on entry to block `B`, and we know that

![\texttt{AVAIL\_IN}[B] =  
\bigcap_{P\in \texttt{pred}[B]} \texttt{AVAIL\_OUT}[P] 
\right)](./availin.svg)

(This discussion glosses over the fact that the callframe reference
must be translated every time that it passes through a φ instruction.
The translation is straightforward - for the intersection above, the
available value will be the output of the phi; for the anticipability
analysis below, the anticipable value will be the input of the phi on
the code path being analyzed.)

Availability by itself is not sufficient for loop-invariant code
motion. In addition, we need the concept of _anticipability_. A (_cf_,
_name_) pair is _anticipable_ at a given point in the program if every
execution path forward from that point contains a `moveFromCallFrame`
retrieving that value prior to the value's being modified.

Calculating anticipability requires multiple traversals of the program
in postdominator order. Nothing is anticipable on 'return'. For blocks
that do not return, we begin by taking the intersection of values
anticipable at their successors:

![\texttt{ANTIC\_OUT}[B] =  
\bigcap_{P\in \texttt{succ}[B]} \texttt{ANTIC\_IN}[P] 
\right)](./anticout.svg)

(Note that in computing this function, we translate the callframe
reference if it appears in a φ operation at the start of the block.)

and then traverse the instructions of the block in reverse order.

+ `moveFromCallFrame` - The given name and callframe reference become
  anticipable.

+ `moveToCallFrame` - The given name and callframe reference, if they
  are anticipable, are removed from the anticipable set, along with
  any names that might potentially be aliases of the given name .This
  becomes the anticipable set for the input callframe. The output
  callframe is no longer relevant and may be forgotten.
  
+ `invoke[Expanded]` - Any variables potentially modified by the
  invoked command, and any potential aliases, are removed from the
  anticipable set, and the new set becomes the anticipable set for the
  input callframe. The output callframe is no longer relevant and may
  be forgotten.

+ `extractCallFrame` - The store operations anticipated in the source
  callframe are precisely those anticipated in the destination
  callframe, and the destination callframe is no longer relevant.
  
+ `directGet` - Since (as stated above) we ignore the possibility that
  a read trace on one variable may modify another, we simply copy all
  anticipable references.
  
+ `directSet`, etc. - Any local variable that might alias the
  variable being modified is no longer anticipable, and once again
  the callframes are swapped as with `moveToCallFrame` and `invoke`.
  
+ `variable`, `upvar`, `nsupvar` - The local variable and any aliases
  are no longer anticipable.
  
+ φ - Analysis stops at a φ instruction that references a callframe.
  When analyzing a predecessor block, `ANTIC_IN` for its successors
  will be translated to refer to the callframe that is input to the φ.

If a block has no φ instruction for the callframe, the same callframe
will be used when constructing its `ANTIC_IN` when analyzing a predecessor.

### Placement of load instructions

Once the `AVAIL_OUT` and `ANTIC_IN` sets are constructed, we can
determine where to place additional `moveFromCallFrame` instructions
that will perform loop-independent code motion. 

Since the entry block of a loop is always a merge point in the control
flow, its predecessor blocks will be single-exit blocks owing to
critical edge splitting. We need consider only insertion of
`moveFromCallFrame` instructions at these points.

The general plan is sketched in Section 4.3.2 of \[VanD04\]. We
examine merge points in the program (blocks with more than one
predecessor), in dominator order. For each (_cf_, _name_) that is 
anticipable at the entry to the block, we check availability in the
predecessor blocks. If the expression is available in at least one
predecessor, but not all predecessors, we insert `moveFromCallFrame`
instructions for it in the predecessors where it is unavailable. 

This is another analysis that must be iterated to
convergence. Inserting these 'moveFromCallFrame' operations produces
new available variables, which can in turn expose further
opportunities for code motion. \[VAND04\] discusses this in more
detail in chapter 4.

### Possible combination of passes

This analysis closely mirrors what happens in partial redundancy
elimination (`quadcode/pre.tcl`). It may be possible to combine the
two into a single pass. The principal change will be that availability
analysis will need some refactoring, to discipline the size of the
`AVAIL` sets. Unlike the description in \[SIMP96\], we do not need to
shrink any of these sets for safety, because every time we perform an
operation that might affect a value in the callframe, we assign a new
SSA value to the callframe. Nevertheless, the quadcode sequence is
structured so that only one callframe is in flight at any given
time. Even if procedure inlining gives rise to additional callframes,
we still have the weaker condition that if there is an instruction
like

`doSomething cfout cfin moreArgs`

the input callframe will never again be used.
  
## TASK 4: Loop-invariant stores



## References

\[DREC93\]
Karl-Heinz Drechsler and Manfred P. Stadel. "A variation of Knoop,
Rüthing, and Steffen's _Lazy Code Motion_". _ACM SIGPLAN Notices_ 28:5
(May, 1993), pp. 29-38.
[<https://dl.acm.org/citation.cfm?id=152823>]

\[SIMP96\]
Loren Taylor Simpson. 1996. "Value-Driven Redundancy
Elimination". Ph.D. Dissertation. Rice University, Houston, TX,
USA. AAI9631092. 
[<https://www.clear.rice.edu/comp512/Lectures/Papers/SimpsonThesis.pdf>]

\[VanD04\]
VanDrunen, Thomas J. "Partial redundancy elimination for global value
numbering." PhD thesis, Purdue University, West Lafayette, Indiana
(August, 2004)
[<ftp://ftp.cs.purdue.edu/pub/hosking/papers/vandrunen.pdf>]