Lecture 10: Abstract Machines


cf. pictures


  • $⟶$: rewriting for code ($λ$-calculus)
  • $\leadsto$: rewriting for abstract machines
  • $\leadsto_{SEA}$: search
  • $\leadsto_{SUB}$: substitute
  • $\overline u^α$: $u$ well-formed and correctly $α$-renamed
Code Stack   Code Stack
$tu$ $π$ $⟶_{SEA}$ $t$ $u::π$
$λx.t$ $u::π$ $⟶_β$ $t \lbrace x ← u \rbrace$ $π$
\underline{(t, ε)} \; ≝ \; t\\ \underline{(t, u :: π)} \; ≝ \; \underline{(tu, π)}\\
  1. s \leadsto_β s' ⟹ \underline{s} ⟶_β \underline{s'}
  2. s ⟶_{SEA} s' ⟹ \underline{s} = \underline{s'}
  3. $s \leadsto_{SEA} s’$ terminates
  4. $⟶, \leadsto_{SEA}$ deterministic
  5. $s$ is final state then $\underline s$ is normal

Micro Abstract Machine

Code Environment   Code Environment
$(λx.t)u r_1 ⋯ r_n$ $E$ $\leadsto_β$ $t r_1 ⋯ r_n$ $[x ← u] :: E$
$\underline{x} r_1 ⋯ r_n$ $E_1 :: [x ← u] :: E_2$ $\leadsto_{SUB}$ $u^α r_1 ⋯ r_n$ $E_1 :: [x ← u] :: E_2$
\underline{(t, ε)} \; ≝ \; t\\ \underline{(t, [x ← u] :: E)} \; ≝ \; \underline{(t \lbrace x ← u\rbrace, E)}\\
t ↓_ε = t\\ t ↓_{[x ← e] :: E} \; ≝ \; t \lbrace x ← u \rbrace ↓_E

Ex: You may need two $\leadsto_β$ in a row:

(λy.(λx.x)y)I, ε\\ \leadsto_β (λx.x)y, [y ← I]\\ \leadsto_β x, [x ← y][y ← I]

Number of $\leadsto_β$: bounded by the “size” of the environment, provided it is not “weird”. We must ensure that we have:

Lemma: Let $s = (t, E)$ be a Micro-AM reachable state.

  1. Abs: if $λx. \overline u$ is a subterm of $\overline t$ or $E$, then $x$ occurs only in $\overline u$

  2. Env scope: $E=E’::[x ← u]::E’’$ then $x$ is fresh wrt $u$ and $E’’$


s \leadsto_β^k s' ⟹ k ≤ \vert E \vert

Ex: Pay attention to renaming. Example of a mistake (we don’t rename in the second step):

(λz.zzIδ)λxy.xy\\ \leadsto_β zzIδ, [z ← λxy.xy]\\ \leadsto_{SUB} (λxy.xy) z I δ, [z ← λxy.xy]\\ \leadsto_β (λy.xy) I δ, [x ← z][z ← λxy.xy]\\ \leadsto_β xy δ, [y ← I][x ← z][z ← λxy.xy]\\ \leadsto_{SUB}^2 (λy.xy) y δ, [y ← I][x ← z][z ← λxy.xy]\\ \leadsto_β (λy.xy) δ, [x ← y][y ← I][x ← z][z ← λxy.xy] \qquad \text{ Property already violated}\\ \leadsto_β xy, [y ← δ][x ← y][y ← I][x ← z][z ← λxy.xy] \\ \leadsto_{SUB} yy, [y ← δ][x ← y][y ← I][x ← z][z ← λxy.xy] \\

So it reduces to $δδ$, which diverges, but it should in fact reduce to $δ$ (in $(λy.(λxy.xy)yIδ)$, not both $y$’s are bound by $δ$), which converges!

Milner Abstract Machines (MAM)

Simplified version of Krivine AM.

Code Stack Environment   Code Stack Environment
$\overline t \overline u$ $π$ $E$ $\leadsto_{SEA}$ $\overline t$ $\overline u :: π$ $E$
$λx. \overline t$ $\overline u :: π$ $E$ $\leadsto_{β}$ $\overline t$ $π$ $[x ← \overline u] :: E$
$x$ $π$ $E_1 :: [x ← u] :: E_2$ $\leadsto_{SUB}$ $\overline u^α$ $π$ $E_1 :: [x ← u] :: E_2$
\underline{(t, u::π, E)} \; ≝ \; \underline{(tu, π, E)}\\ \underline{(t, ε, [x ← u] :: E)} \; ≝ \; \underline{(t \lbrace x ← u \rbrace, ε, E)}\\ \underline{(t, ε, ε)} \; ≝ \; t
\underline ε \; ≝ \; ⟨⟩\\ \underline {u::π} \; ≝ \; π(⟨⟩u)\\

TODO: write $↓$

Complexity analysis

λ_{CBM} ⟶_{POL} MAM ⟶_{POL} RAM

Let d: t_0 ⟶_{β_{cbn}}^k s be a derivation

  • Input: the size $\vert t_0 \vert$ of the initial term
  • Length: $\vert d \vert = k$
  1. Number of machine transitions
  2. Cost of a single transition
  3. Combine the two
Code Environment   Code Environment
$(λx.t)u r_1 ⋯ r_k$ $E$ $\leadsto_β$ $t r_1 ⋯ r_k$ $[x ← u] :: E$
$\underline{x} r_1 ⋯ r_k$ $E_1 :: [x ← u] :: E_2$ $\leadsto_{SUB}$ $u^α r_1 ⋯ r_k$ $E_1 :: [x ← u] :: E_2$

Let $ρ: s \leadsto^\ast s’$

How do $\vert ρ \vert_{SUB}$ and $\vert ρ \vert_β$ compare?

The size of the environment is the number of $β$ transitions. But if

s_0 \leadsto_β^n \leadsto_{SUB}^k s_1

then, the $k ≤ n + \text{ size of the environment at } s_0$.

Therefore, if

(t_0, ε) \leadsto_β^{a_1} \leadsto_{SUB}^{b_1} ⋯ \leadsto_β^{a_k} \leadsto_{SUB}^{b_k} s'


\vert ρ \vert_{SUB} = \sum\limits_{ i=1 }^k b_i = \sum\limits_{ i=1 }^k \underbrace{\sum\limits_{ j=1 }^i a_j}_{≤ \vert ρ \vert_k} ≤ \sum\limits_{ i=1 }^k \vert ρ \vert_β ≤ \vert ρ \vert_β^2


k ≤ \vert ρ \vert_β\\ b_i ≤ \sum\limits_{ j=1 }^i a_j

So \vert ρ \vert_{SUB} = O(\vert ρ \vert_{β}^2)

Is this bound reached? Yes:

(λx_0. x_0 x_0)δ\\ \leadsto_β x_0 x_0, [x_0 ← δ]\\ \leadsto_{SUB} (λx_1. x_1 x_1) x_0, [x_0 ← δ]\\ \leadsto_β x_1 x_1, [x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} x_0 x_1, [x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} (λx_2.x_2 x_2) x_1, [x_1 ← x_0][x_0 ← δ]\\ \leadsto_β x_2 x_2, [x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} x_1 x_2, [x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} x_0 x_2, [x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} (λx_3.x_3 x_3) x_2, [x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ \leadsto_β x_3 x_3, [x_3 ← x_2][x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ ⋯

Subterm Invariant

The “equivalent” of the Hauptsatz in sequent calculus, or the subformula property.

Lemma (Subterm Invariant): Let ρ: (\overline{t_0}, ε, ε) \leadsto^\ast (\overline u, π, E) be an execution. Then $u$ and any code in $E$ and $π$ are subterms of the $t_0$ (up to $α$)

Proof: The only subtle proof step: $\leadsto_{SUB}$ (the only step where the machine duplicates): $u$ is duplicated.

This gives us a bound on the size of duplicated terms.


t_n ⟶_β^n r_n\\ \text{ where } \vert r_n \vert = Ω(2^n)

This lemma tells us: whenever

s \leadsto^n s'

as in each step you can only duplicate subterms, then the size of $s’$ is bounded by

\vert s' \vert ≤ (n+1) \vert t_0 \vert

⟹ there’s no size explosion wrt the number of steps.

Warning: it could happen that the number of steps is itself exponential ⟹ we have to make sure that the number of transitions is reasonable (from $λ$ to $M$, before even going from $M$ to $RAM$)

(u, E) \leadsto_{SEA, β}^k s ⟹ k ≤ \vert u \vert ≤ \vert t_0 \vert

by the subterm invariant.

But $\leadsto_{SUB}$ increases the size of the term: replaces a variable (size $1$) by a term (size $≥ 1$). But by the subterm invariant, this term is a subterm of $t_0$, so:

\vert ρ \vert_{SEA} ≤ \vert t_0 \vert + \vert ρ \vert_{SUB} \vert t_0 \vert = (\vert t_0 \vert + 1) \vert ρ \vert_{SUB} ≤ (\vert t_0 \vert + 1) \vert ρ \vert_β^2

Recall that that AM take care of SEA(rch), SUB(stitution), and NAMES. But SEA is quadratic wrt to SUB ⟹ we can afford not to take it into account, it doesn’t impact much the complexity. NAMES impact even less the complexity.

  Number of transitions
SEA $(\vert t_0 \vert + 1) \vert ρ \vert_β^2$
$β$ $\vert ρ \vert_β$
SUB $\vert ρ \vert_β^2$

With pointers, the SEA and the $β$ transitions take constant time.

As for SUB: if we implement the environments as lists, you don’t have constant time access. But if variables are pointers, we can access the substituted term for $x$ in the environment in constant time ⟹ so SUB is bounded by $\vert t_0 \vert$ (as $u$ is a subterm thereof).


  Number of transitions Cost of single transition Global cost
SEA $(\vert t_0 \vert + 1) \vert ρ \vert_β^2$ $O(1)$ $O((\vert t_0 \vert + 1) \vert ρ \vert_β^2)$
$β$ $\vert ρ \vert_β$ $O(1)$ $O(\vert ρ \vert_β)$
SUB $\vert ρ \vert_β^2$ $O(\vert t_0 \vert)$ $O((\vert t_0 \vert + 1) \vert ρ \vert_β^2)$
  digraph {
    λ -> MAM[label="|t₀||ρ|_β²"];
    MAM -> RAM -> TM;
    TM -> λ[label="linear"];

Call-by-Value evaluation

We saw that there’s an efficient abstract machine to implement call-by-name evaluation. But reasonable cost models are not about finding such efficient machines.

TM \overset{\text{linear}}{⟶} λ_{det}

But $λ$-calculus is bigger, and there are terms where the evaluation strategy matters.

Ex: Duplicator:

\text{ CBV: } δ(Ix) ⟶ δ x ⟶ xx\\ \text{ CBN: } δ(Ix) ⟶ (Ix)(Ix) ⟶ x (Ix) ⟶ xx

⟹ CBN seems to be silly (we duplicate work), but we just showed that it is reasonable.

Ex: Erasor:

\text{ CBV: } (λz.y)(δδ) ⟶ (λz.y)(δδ) ⟶ ⋯\\ \text{ CBN: } (λz.y)(δδ) ⟶ y

⟹ CBV seems to be silly, but we can show that it is reasonable too.

Being reasonable has nothing to do with finding an efficient strategy. It means that the overhead is not too complex (polynomial) ⟹ relative efficiency.

But still, are there non reasonable strategies? Yes (there’s one example in the literature: Jean-Jacques Lévy’s one).

Is there an optimal strategy (that takes the least number of steps)? No, the optimal strategy is not recursive.

But even though it is not recursive, we can have a notion of parallel optimal strategy, which is recursive (shown by Lévy). But Lévy didn’t know how to implement it. It was done a few years later by someone else. Question that arose: can take $k$ (the minimal number of steps for this optimal parallel strategy) as the complexity of $t$? It was proven that no, it’s not reasonable (this is an example of unreasonable strategy).

But it doesn’t make it useless for all that: just because it’s not reasonable doesn’t mean that it’s not efficient (hidden but wrong assumption: steps count as 1, i.e. there are reasonable). Nowadays, we still don’t know if it’s efficient or not.

Comparing the number of steps of strategies that are not reasonable doesn’t make sense.

Weak Call-by-Value (CBV) $λ$-calculus

v \; ≝ \; x \; \mid \; λx. t
\cfrac{}{(λx.t)\underline v ⟶_{wβ_v} t \lbrace x ← v\rbrace}
\cfrac{t ⟶_{wβ_v} t'}{tu ⟶_{wβ_v} t'u} \qquad \cfrac{t ⟶_{wβ_v} t'}{ut ⟶_{wβ_v} ut'}

NB: we consider only the weak version (we don’t reduce under $λ$’s) because generally it’s used with a programming application in mind

Harmony property: if $t$ is closed, then

  • either t ⟶_{wβ_v} t'
  • or $t$ is a value (an abstraction)

Proof: by induction on $t$.

So if $t$ is closed:

  • either $t$ reduces to another term
  • or it diverges

NB: if we remove variables from values, nothing changes.

In theoretical papers, values are defined as

v_t \; ≝ \; x \; \mid \; λx. t

In papers about abstract machines (practical), values are defined as

v_p \; ≝ \; λx. t

⟹ Why nothing changes? Because a variable can never be arguments: the only way to have a subterm of the form $tx$ is to bind the variable: $λx.tx$, and then we’re stuck: we don’t reduce under abstractions.

This is better than confluent, it has the diamond property:

(\underline{Iδ})(\underline{δI}) ⟶ δ(δI) ⟶ δ(II)\\ (\underline{Iδ})(\underline{δI}) ⟶ (Iδ)(II) ⟶ δ(II)\\

⟹ we can always close the diagram in one step.

It fails when we’re allowed to duplicate redexes, as in $δ(II)$. But in CBV, we can’t: the only terms we can duplicate are values, that are normal.

On top of that, all reduction sequences have the same length.

Right-to-left strategy (the only rule that changes):

\cfrac{t ⟶_{wβ_v} t'}{tv ⟶_{wβ_v} t'v}

So contexts are given by:

R \; ≝ \; ⟨⟩ \; \mid \; u R \; \mid \; R v


R ⟨(λx.t)v⟩ \mid E \leadsto_{β_v} R ⟨t⟩ \mid [x ← v] :: E\\ R ⟨x⟩ \mid E_1 :: [x ← v] :: E_2 \leadsto_{SUB} R ⟨v^α⟩ \mid E_1 :: [x ← v] :: E_2

We can rewrite the last ones in the same fashion, where $A$ is an applicative context:

Code Environment   Code Environment
$A⟨(λx.t)⟩$ $E$ $\leadsto_β$ $A⟨t⟩$ $[x ← u] :: E$
$A⟨x⟩$ $E_1 :: [x ← u] :: E_2$ $\leadsto_{SUB}$ $A⟨u^α⟩$ $E_1 :: [x ← u] :: E_2$

Leave a comment