Partial Evaluation and Automatic Program Generation

Jones, Gomard, Sestoft

Chapter 2: Functions, Types, and Expressions

Definitions:

A type can be considered a set of values.
A higher-order function is a function which takes a function as argument or returns a function as its result (when applied).
An expression is well-typed if all operators and functions in the expression are applied to arguments of a suitable type.
The sum of two or more sets is a set of tagged values. A tagged value is written as a tag applied to a value.
The collection of values defined by recursive type equations is called a data type.
A recursive data type used for representing program fragments is often called an abstract syntax.

Other Notes:

Sum Types
- A Sum Type is equivalent to a Disjoint Union. The difference between a Sum Type and Union Type is that the Sum Type implies that the type T cannot be "both" at the same time.

Chapter 3: Programming Languages and Interpreters

Definitions:

An occurrence of variable x in a lambda expression is bound if it is inside a subexpression of form λx....
An occurrence of variable x in a lambda expression is free if it is not bound.
A redex is an expression which can be reduced by beta-(function application), delta-(data value computation), or conditional reduction.
A top-level redex is a redex not inside a lambda abstraction.
A lambda expression is in weak head normal form (whnf) if it is a constant, or a function, or a free variable, or an application of a free variable.
A free variable that becomes bound during reduction is said to be captured.
Performing beta-reduction (function application) on a top-level application before evaluating it to whnf is known as call-by-name reduction , or normal order reduction.
Evaluating a top-level application to whnf before performing beta-reduction (function application) is known as call-by-value reduction , or applicative order reduction.
The completeness property states that if M can be reduced to a constant c, then the call-by-name reduction will reduce M to c.
The Church-Rosser theorem states that different reduction sequences will never produce different final unreducible answers (in the form of constant values).
An environment is a mapping of the free variables of an expression to their values.
A pair of an expression and an environment is called a closure.
Closures that contain unevaluated argument expressions are sometimes called thunks or suspensions.
A store or memory maps variables or their corresponding storage locations to the values currently stored in them.
The term bootstrapping refers to the use of compilers to compile themselves.

Other Notes:

Lambda calculus reduction rules:
- alpha-conversion:
  - λx.m => λy.[y/x]M
  - Renaming the bound variable x to y.
  - Restriction: y must not be free in M.
- beta-reduction:
  - (λx.M)(N) => [N/x]M
  - Function application. A function λx.M is applied to argument N by substituting N for every free occurrence of x in M.
  - Restriction: no free variable N may become bound as a result of the substitution [N/x]M. This problem can be avoided by first renaming bound variables in M.
- delta-reduction:
  - op a1...an => b, whenever b is the result of applying op to constants a1,...,an
  - Computation with data values. For example, (+ 5 6) can be reduced to 11.
  - Restriction: the arguments a1,...,an must be data constants
- reduction of conditional:
  - if true then M else N => M
  - if false then M else N => N
- reduction in context:
  - ...M.... => ...N... whenever M => N
  - A subexpression of a lambda expression may be reduced without changing the rest of the expression.
- repeated reduction:
  - M1 => M3 whenever M1 => M2 and M2 => M3
  - A computation consists of a series of the above reductions. For instance, a conditional (if B then M else N) may be reduced by first reducing B to an atomic value b; then if b is true, the conditional reduced to M, else if b is false, the conditional reduced to N.
- fix-point reduction:
  - Y M => M(Y M) where Y is the "Y constant"
- Y combinator
  - Yn = λh.(λx.h(x x))(λx.h(x x))

Chapter 4: Partial Evaluation for a Flow Chart Language

Definitions:

A residual program is the result of executing a program specializer on a program and part of the program's input data, and will perform the rest of the computation when the rest of the input data is supplied.
The general possibility of partially evaluating a program is known as Kleene's s-m-n theorem in recursive function theory.
The first Futamura projection states that specializing an interpreter with respect to a source program has the effect of compiling the source program.
A variable is considered static if its value can be determined at program specialization time. Otherwise, the variable is considered to be dynamic.
The classification of variables as either static or dynamic is called a division.
A division is uniform if it is valid at all program points.
A division is congruent if the values of the static variables at a certain program point can be computed from the previous program point. The result is that any variable that depends on a dynamic variable must itself be dynamic.
The replacement of a jump statement with the destination's instructions is known as transition compression.
The process of classifying a variable as dynamic, when congruence would have allowed it to be static, is called generalization.
The process of computing the division of all program variables given a division of the input variables is called binding-time analysis ( BTA ).
A strategy is said to be online if the concrete values computed during program specialization can affect the choice of action taken. Otherwise the strategy is offline.
The second Futamura projection states that when specializing the specializer with respect to an interpreter, the residual program will be a compiler.
A variable can be said to be of bounded static variation if it can only assume one of a finite set of values and that its possible value set is statically computable.
The third Futamura projection states that specializing the specializer with respect to the specializer will result in a compiler generator.
Binding-time improvement can sometimes be attained by transforming the source program in ways that do not alter the standard meaning of the program but lead to better residual programs.
A division is pointwise if it specifies a division at each program point.
Live variable analysis is the process of determining which static variables can affect future computation or control flow.
A division is polyvariant if each program point is assigned a set of possible divisions. Otherwise, the division is said to be monovariant.

Other Notes:

"The Trick"
- If a dynamic variable is of bounded static variation, it can be treated as static.

Chapter 5: Partial Evaluation for a First-Order Functional Language

Definitions:

If a task/optimization is performed as a pass after specialization, it is described as postphase. Otherwise, if it is performed during specialization, it is known as on the fly.
An inequality on the binding times of variables is known as a constraint.
Binding-time analysis is considered safe if the division it computes is congruent.
Binding-time analysis is considered approximate if it classifies a parameter as dynamic even when it cannot be bound to a residual expression.
A two-level syntax represents annotations by having a static and dynamic version of each construct of the language.

Other Notes:

How do we know that the division determined by the BTA abstract interpretation is the best one?
- An iterative dataflow analysis over a distributive lattice will always find the best solution.
  - Source: http://www.cs.cmu.edu/afs/cs/academic/class/15745-s06/web/handouts/05.pdf
  - Indirectly from: https://dl.acm.org/doi/10.1145/512927.512945

Chapter 7: Online, Offline, and Self-application

Definitions:

Continuation passing style (CPS) is a style of programming in which functions do not return values; rather, they pass control onto a continuation, which specifies what happens next.
The component of an offline partial evaluator that performs the reduction of the annotated program is often called the specialization kernel.
Holst's poor man's generalization , a refinement of liveness analysis, states that a variable should be generalized if no control decisions depend on its value.
Mixline partial evaluation is a compromise between online and offline partial evaluation where the usual binding-time domain {S,D} is augmented with a third binding time M. Now S should be interpreted as "always static", D as "always dynamic", and M as "mixed binding times", to be tested online.

Other Notes:

Comparing online and offline partial evaluation
- Offline partial evaluations treat their static data in a uniform way, based on preprocessing the subject program. The general pattern is that given a program and knowledge of which of its inputs are to be static but not their values, an annotated program is constructed (or a division is computed). Once static data values are available, specialization proceeded by obeying the annotations.
- In contrast, online partial evaluators typically have no preprocessing, make more decisions on the fly, and so are better able to exploit computational opportunities.
Partial evaluator decision making
- A partial evaluator typically has to choose an action in the following situations:
  - For each operator (+, if, ...), should/can the operator be reduced at PE time, or should residual code be generated?
  - For each variable/parameter at each program point, should/can it be considered static (and so reduced) or should it be dynamic (should residual code be generated for it)?
  - For each function call/jump, should it be unfolded/compressed (reduced) or should it be residualized?
Advantages of online methods
- The main advantages of online over offline methods stem from their non-approximative distinction between static and dynamic values.
- Conditionals and mixed binding times can be successfully propagate static-ness of one branch if the conditional's value is known to lead to that branch.
- Static knowledge about "dynamic" conditionals/calls can be used by an online partial evaluator when all branches are static and have common characteristics.
Advantages of offline methods
- Problem factorization : the splitting of partial evaluation into two separate phases (BTA and the specialization kernel) has proven helpful in both the design and implementation phases of several partial evaluation projects.
- Efficiency : the offline approach BTA + specialization kernel is much more efficient than online.
- Fast multistage specialization : specializing the specializer can speed up the partial evaluation process.
- Binding-time annotations as a user interface provides the ability for a user to diagnose and debug the binding-time properties of a program undergoing partial evaluation.
- Program analysis : binding-time analysis is often combined with other program analyses to satisfy other requirements than just congruence. Typical goals are to achieve finiteness, to avoid computation duplication, and to avoid specialization with respect to dead static variables.
"Recent" Research
- (1999) Online-and-offline partial evaluation: a mixed approach
  - https://dl.acm.org/doi/10.1145/328690.328694
- (2001) A Hybrid Approach to Online and Offline Partial Evaluation
  - http://www.kb.ecei.tohoku.ac.jp/~sumii/pub/onpe-hosc.pdf
- (2004) Offline Partial Evaluation Can Be as Accurate as Online Partial Evaluation
  - https://dl.acm.org/doi/10.1145/963778.963784

Chapter 8: Partial Evaluation for the Lambda Calculus

Definitions:

An annotation-forgetting function , when applied to a two-level expression, returns a one-level expression which differs from the original expression only in that all annotations and lift operators have been removed.
A binding-time assumption for an expression is type environment which maps each free variable from that expression to either S or D.
Given a binding-time assumption, a completion of an expression is a two-level expression such that applying the annotation-forgetting function to it results in the original expression.
A completion is minimal if it has the least amount of dynamic annotations possible. An expression may have multiple minimal completions that differ in the choice of lift-points.

Other Notes:

Algorithm W : a type inference method for the Hindley-Milner (HM) type system.

Chapter 9: Partial Evaluation for Prolog

Definitions:

Dependency analysis is used to trace which variables will depend on others by unification.
Groundness analysis is used to determine in which cases the unification of a static and a dynamic variable should cause the static variable to be reclassified as dynamic, i.e. when the static variable can be non-ground.
Determinacy analysis is used to find out whether a static goal has at most one solution, or possibly more than one.
Side-effect analysis is used to classify goals containing side-effecting primitives as dynamic.

Chapter 10: Aspects of Similix: A Partial Evaluator for a Subset of Scheme

Definitions:

Two functions f and g are extensionally equal if and only if, for all x : f(x) = g(x).
Occurrence counting analysis finds an upper limit on the number of times a let-bound variable may occur in the residual expression.
Context propagation is the technique of evaluating static subexpressions of dynamic expressions with context from outside the dynamic expression.
Expression reduction with context propagation is called continuation-based reduction.
A continuation is a function which consumes the result of the function and produces the final result of the computation.
A continuation is well-behaved if ... (p. 220)
With arity-raising , one dynamic (or partially static) function parameter in the subject program may give rise to several function parameters in the residual program.

Chapter 11: Partial Evaluation for the C Language

Definitions:

Function specialization is the generation of a function, specialized with respect to certain static data.
Referential transparency is a property of program languages which means that two calls to a function with the same arguments always yield the same result.

Chapter 12: Binding-Time Improvements

Definitions:

A program transformation that preserves semantics by makes the program more suited for partial evaluation is called a binding-time improvement.
The occurrence of something otherwise static in a dynamic expression is known as r esidual code context or dynamic code context.
Eta conversion is a way to protect the binding-time of an expression from becoming dynamic by encapsulating it in a lambda which is classified as dynamic.