Lecture 10: Abstract Machines

Reminder:

cf. pictures

Notations:

  • : rewriting for code (λ-calculus)
  • : rewriting for abstract machines
  • SEA: search
  • SUB: substitute
  • uα: u well-formed and correctly α-renamed
Code Stack   Code Stack
tu π SEA t u::π
λx.t u::π β t{xu} π
(t,ε)t(t,u::π)(tu,π)
  1. sβssβs
  2. sSEAss=s
  3. sSEAs terminates
  4. ,SEA deterministic
  5. s is final state then s is normal

Micro Abstract Machine

Code Environment   Code Environment
(λx.t)ur1rn E β tr1rn [xu]::E
xr1rn E1::[xu]::E2 SUB uαr1rn E1::[xu]::E2
(t,ε)t(t,[xu]::E)(t{xu},E) tε=tt[xe]::Et{xu}E

Ex: You may need two β in a row:

(λy.(λx.x)y)I,εβ(λx.x)y,[yI]βx,[xy][yI]

Number of β: bounded by the “size” of the environment, provided it is not “weird”. We must ensure that we have:

Lemma: Let s=(t,E) be a Micro-AM reachable state.

  1. Abs: if λx.u is a subterm of t or E, then x occurs only in u

  2. Env scope: E=E::[xu]::E then x is fresh wrt u and E

So

sβksk|E|

Ex: Pay attention to renaming. Example of a mistake (we don’t rename in the second step):

(λz.zzIδ)λxy.xyβzzIδ,[zλxy.xy]SUB(λxy.xy)zIδ,[zλxy.xy]β(λy.xy)Iδ,[xz][zλxy.xy]βxyδ,[yI][xz][zλxy.xy]SUB2(λy.xy)yδ,[yI][xz][zλxy.xy]β(λy.xy)δ,[xy][yI][xz][zλxy.xy] Property already violatedβxy,[yδ][xy][yI][xz][zλxy.xy]SUByy,[yδ][xy][yI][xz][zλxy.xy]

So it reduces to δδ, which diverges, but it should in fact reduce to δ (in (λy.(λxy.xy)yIδ), not both y’s are bound by δ), which converges!

Milner Abstract Machines (MAM)

Simplified version of Krivine AM.

Code Stack Environment   Code Stack Environment
tu π E SEA t u::π E
λx.t u::π E β t π [xu]::E
x π E1::[xu]::E2 SUB uα π E1::[xu]::E2
(t,u::π,E)(tu,π,E)(t,ε,[xu]::E)(t{xu},ε,E)(t,ε,ε)t εu::ππ(u)

TODO: write

Complexity analysis

λCBMPOLMAMPOLRAM

Let d:t0βcbnks be a derivation

  • Input: the size |t0| of the initial term
  • Length: |d|=k
  1. Number of machine transitions
  2. Cost of a single transition
  3. Combine the two
Code Environment   Code Environment
(λx.t)ur1rk E β tr1rk [xu]::E
xr1rk E1::[xu]::E2 SUB uαr1rk E1::[xu]::E2

Let ρ:ss

How do |ρ|SUB and |ρ|β compare?

The size of the environment is the number of β transitions. But if

s0βnSUBks1

then, the kn+ size of the environment at s0.

Therefore, if

(t0,ε)βa1SUBb1βakSUBbks

then

|ρ|SUB=i=1kbi=i=1kj=1iaj|ρ|ki=1k|ρ|β|ρ|β2

where

k|ρ|βbij=1iaj

So |ρ|SUB=O(|ρ|β2)

Is this bound reached? Yes:

(λx0.x0x0)δβx0x0,[x0δ]SUB(λx1.x1x1)x0,[x0δ]βx1x1,[x1x0][x0δ]SUBx0x1,[x1x0][x0δ]SUB(λx2.x2x2)x1,[x1x0][x0δ]βx2x2,[x2x1][x1x0][x0δ]SUBx1x2,[x2x1][x1x0][x0δ]SUBx0x2,[x2x1][x1x0][x0δ]SUB(λx3.x3x3)x2,[x2x1][x1x0][x0δ]βx3x3,[x3x2][x2x1][x1x0][x0δ]

Subterm Invariant

The “equivalent” of the Hauptsatz in sequent calculus, or the subformula property.

Lemma (Subterm Invariant): Let ρ:(t0,ε,ε)(u,π,E) be an execution. Then u and any code in E and π are subterms of the t0 (up to α)

Proof: The only subtle proof step: SUB (the only step where the machine duplicates): u is duplicated.

This gives us a bound on the size of duplicated terms.

Recall

tnβnrn where |rn|=Ω(2n)

This lemma tells us: whenever

sns

as in each step you can only duplicate subterms, then the size of s is bounded by

|s|(n+1)|t0|

⟹ there’s no size explosion wrt the number of steps.

Warning: it could happen that the number of steps is itself exponential ⟹ we have to make sure that the number of transitions is reasonable (from λ to M, before even going from M to RAM)

(u,E)SEA,βksk|u||t0|

by the subterm invariant.

But SUB increases the size of the term: replaces a variable (size 1) by a term (size 1). But by the subterm invariant, this term is a subterm of t0, so:

|ρ|SEA|t0|+|ρ|SUB|t0|=(|t0|+1)|ρ|SUB(|t0|+1)|ρ|β2

Recall that that AM take care of SEA(rch), SUB(stitution), and NAMES. But SEA is quadratic wrt to SUB ⟹ we can afford not to take it into account, it doesn’t impact much the complexity. NAMES impact even less the complexity.

  Number of transitions
SEA (|t0|+1)|ρ|β2
β |ρ|β
SUB |ρ|β2

With pointers, the SEA and the β transitions take constant time.

As for SUB: if we implement the environments as lists, you don’t have constant time access. But if variables are pointers, we can access the substituted term for x in the environment in constant time ⟹ so SUB is bounded by |t0| (as u is a subterm thereof).

Therefore:

  Number of transitions Cost of single transition Global cost
SEA (|t0|+1)|ρ|β2 O(1) O((|t0|+1)|ρ|β2)
β |ρ|β O(1) O(|ρ|β)
SUB |ρ|β2 O(|t0|) O((|t0|+1)|ρ|β2)
%3 λ λ MAM MAM λ->MAM |t₀||ρ|_β² RAM RAM MAM->RAM TM TM RAM->TM TM->λ linear

Call-by-Value evaluation

We saw that there’s an efficient abstract machine to implement call-by-name evaluation. But reasonable cost models are not about finding such efficient machines.

TMlinearλdet

But λ-calculus is bigger, and there are terms where the evaluation strategy matters.

Ex: Duplicator:

 CBV: δ(Ix)δxxx CBN: δ(Ix)(Ix)(Ix)x(Ix)xx

⟹ CBN seems to be silly (we duplicate work), but we just showed that it is reasonable.

Ex: Erasor:

 CBV: (λz.y)(δδ)(λz.y)(δδ) CBN: (λz.y)(δδ)y

⟹ CBV seems to be silly, but we can show that it is reasonable too.

Being reasonable has nothing to do with finding an efficient strategy. It means that the overhead is not too complex (polynomial) ⟹ relative efficiency.

But still, are there non reasonable strategies? Yes (there’s one example in the literature: Jean-Jacques Lévy’s one).

Is there an optimal strategy (that takes the least number of steps)? No, the optimal strategy is not recursive.

But even though it is not recursive, we can have a notion of parallel optimal strategy, which is recursive (shown by Lévy). But Lévy didn’t know how to implement it. It was done a few years later by someone else. Question that arose: can take k (the minimal number of steps for this optimal parallel strategy) as the complexity of t? It was proven that no, it’s not reasonable (this is an example of unreasonable strategy).

But it doesn’t make it useless for all that: just because it’s not reasonable doesn’t mean that it’s not efficient (hidden but wrong assumption: steps count as 1, i.e. there are reasonable). Nowadays, we still don’t know if it’s efficient or not.

Comparing the number of steps of strategies that are not reasonable doesn’t make sense.



Weak Call-by-Value (CBV) λ-calculus

vxλx.t (λx.t)vwβvt{xv} twβvttuwβvtutwβvtutwβvut

NB: we consider only the weak version (we don’t reduce under λ’s) because generally it’s used with a programming application in mind

Harmony property: if t is closed, then

  • either twβvt
  • or t is a value (an abstraction)

Proof: by induction on t.

So if t is closed:

  • either t reduces to another term
  • or it diverges

NB: if we remove variables from values, nothing changes.

In theoretical papers, values are defined as

vtxλx.t

In papers about abstract machines (practical), values are defined as

vpλx.t

⟹ Why nothing changes? Because a variable can never be arguments: the only way to have a subterm of the form tx is to bind the variable: λx.tx, and then we’re stuck: we don’t reduce under abstractions.

This is better than confluent, it has the diamond property:

(Iδ)(δI)δ(δI)δ(II)(Iδ)(δI)(Iδ)(II)δ(II)

⟹ we can always close the diagram in one step.

It fails when we’re allowed to duplicate redexes, as in δ(II). But in CBV, we can’t: the only terms we can duplicate are values, that are normal.

On top of that, all reduction sequences have the same length.

Right-to-left strategy (the only rule that changes):

twβvttvwβvtv

So contexts are given by:

RuRRv

AM:

R(λx.t)vEβvRt[xv]::ERxE1::[xv]::E2SUBRvαE1::[xv]::E2

We can rewrite the last ones in the same fashion, where A is an applicative context:

Code Environment   Code Environment
A(λx.t) E β At [xu]::E
Ax E1::[xu]::E2 SUB Auα E1::[xu]::E2

Leave a comment