Lecture 10: Abstract Machines

Reminder:

cf. pictures

Notations:

$⟶$: rewriting for code ($λ$-calculus)
$\leadsto$: rewriting for abstract machines
$\leadsto_{SEA}$: search
$\leadsto_{SUB}$: substitute
$\overline u^α$: $u$ well-formed and correctly $α$-renamed

Code	Stack		Code	Stack
$tu$	$π$	$⟶_{SEA}$	$t$	$u::π$
$λx.t$	$u::π$	$⟶_β$	$t \lbrace x ← u \rbrace$	$π$

\[\underline{(t, ε)} \; ≝ \; t\\ \underline{(t, u :: π)} \; ≝ \; \underline{(tu, π)}\\\]

\[s \leadsto_β s' ⟹ \underline{s} ⟶_β \underline{s'}\]
\[s ⟶_{SEA} s' ⟹ \underline{s} = \underline{s'}\]
$s \leadsto_{SEA} s’$ terminates
$⟶, \leadsto_{SEA}$ deterministic
$s$ is final state then $\underline s$ is normal

Micro Abstract Machine

Code	Environment		Code	Environment
$(λx.t)u r_1 ⋯ r_n$	$E$	$\leadsto_β$	$t r_1 ⋯ r_n$	$[x ← u] :: E$
$\underline{x} r_1 ⋯ r_n$	$E_1 :: [x ← u] :: E_2$	$\leadsto_{SUB}$	$u^α r_1 ⋯ r_n$	$E_1 :: [x ← u] :: E_2$

\[\underline{(t, ε)} \; ≝ \; t\\ \underline{(t, [x ← u] :: E)} \; ≝ \; \underline{(t \lbrace x ← u\rbrace, E)}\\\] \[t ↓_ε = t\\ t ↓_{[x ← e] :: E} \; ≝ \; t \lbrace x ← u \rbrace ↓_E\]

Ex: You may need two $\leadsto_β$ in a row:

\[(λy.(λx.x)y)I, ε\\ \leadsto_β (λx.x)y, [y ← I]\\ \leadsto_β x, [x ← y][y ← I]\]

Number of $\leadsto_β$: bounded by the “size” of the environment, provided it is not “weird”. We must ensure that we have:

Lemma: Let $s = (t, E)$ be a Micro-AM reachable state.

Abs: if $λx. \overline u$ is a subterm of $\overline t$ or $E$, then $x$ occurs only in $\overline u$

Env scope: $E=E’::[x ← u]::E’’$ then $x$ is fresh wrt $u$ and $E’’$

\[s \leadsto_β^k s' ⟹ k ≤ \vert E \vert\]

Ex: Pay attention to renaming. Example of a mistake (we don’t rename in the second step):

\[(λz.zzIδ)λxy.xy\\ \leadsto_β zzIδ, [z ← λxy.xy]\\ \leadsto_{SUB} (λxy.xy) z I δ, [z ← λxy.xy]\\ \leadsto_β (λy.xy) I δ, [x ← z][z ← λxy.xy]\\ \leadsto_β xy δ, [y ← I][x ← z][z ← λxy.xy]\\ \leadsto_{SUB}^2 (λy.xy) y δ, [y ← I][x ← z][z ← λxy.xy]\\ \leadsto_β (λy.xy) δ, [x ← y][y ← I][x ← z][z ← λxy.xy] \qquad \text{ Property already violated}\\ \leadsto_β xy, [y ← δ][x ← y][y ← I][x ← z][z ← λxy.xy] \\ \leadsto_{SUB} yy, [y ← δ][x ← y][y ← I][x ← z][z ← λxy.xy] \\\]

So it reduces to $δδ$, which diverges, but it should in fact reduce to $δ$ (in $(λy.(λxy.xy)yIδ)$, not both $y$’s are bound by $δ$), which converges!

Milner Abstract Machines (MAM)

Simplified version of Krivine AM.

Code	Stack	Environment		Code	Stack	Environment
$\overline t \overline u$	$π$	$E$	$\leadsto_{SEA}$	$\overline t$	$\overline u :: π$	$E$
$λx. \overline t$	$\overline u :: π$	$E$	$\leadsto_{β}$	$\overline t$	$π$	$[x ← \overline u] :: E$
$x$	$π$	$E_1 :: [x ← u] :: E_2$	$\leadsto_{SUB}$	$\overline u^α$	$π$	$E_1 :: [x ← u] :: E_2$

\[\underline{(t, u::π, E)} \; ≝ \; \underline{(tu, π, E)}\\ \underline{(t, ε, [x ← u] :: E)} \; ≝ \; \underline{(t \lbrace x ← u \rbrace, ε, E)}\\ \underline{(t, ε, ε)} \; ≝ \; t\] \[\underline ε \; ≝ \; ⟨⟩\\ \underline {u::π} \; ≝ \; π(⟨⟩u)\\\]

TODO: write $↓$

Complexity analysis

\[λ_{CBM} ⟶_{POL} MAM ⟶_{POL} RAM\]

Let $d: t_0 ⟶_{β_{cbn}}^k s$ be a derivation

Input: the size $\vert t_0 \vert$ of the initial term
Length: $\vert d \vert = k$

Number of machine transitions
Cost of a single transition
Combine the two

Code	Environment		Code	Environment
$(λx.t)u r_1 ⋯ r_k$	$E$	$\leadsto_β$	$t r_1 ⋯ r_k$	$[x ← u] :: E$
$\underline{x} r_1 ⋯ r_k$	$E_1 :: [x ← u] :: E_2$	$\leadsto_{SUB}$	$u^α r_1 ⋯ r_k$	$E_1 :: [x ← u] :: E_2$

Let $ρ: s \leadsto^\ast s’$

How do $\vert ρ \vert_{SUB}$ and $\vert ρ \vert_β$ compare?

The size of the environment is the number of $β$ transitions. But if

\[s_0 \leadsto_β^n \leadsto_{SUB}^k s_1\]

then, the $k ≤ n + \text{ size of the environment at } s_0$.

Therefore, if

\[(t_0, ε) \leadsto_β^{a_1} \leadsto_{SUB}^{b_1} ⋯ \leadsto_β^{a_k} \leadsto_{SUB}^{b_k} s'\]

then

\[\vert ρ \vert_{SUB} = \sum\limits_{ i=1 }^k b_i = \sum\limits_{ i=1 }^k \underbrace{\sum\limits_{ j=1 }^i a_j}_{≤ \vert ρ \vert_k} ≤ \sum\limits_{ i=1 }^k \vert ρ \vert_β ≤ \vert ρ \vert_β^2\]

where

\[k ≤ \vert ρ \vert_β\\ b_i ≤ \sum\limits_{ j=1 }^i a_j\]

So $\vert ρ \vert_{SUB} = O(\vert ρ \vert_{β}^2)$

Is this bound reached? Yes:

\[(λx_0. x_0 x_0)δ\\ \leadsto_β x_0 x_0, [x_0 ← δ]\\ \leadsto_{SUB} (λx_1. x_1 x_1) x_0, [x_0 ← δ]\\ \leadsto_β x_1 x_1, [x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} x_0 x_1, [x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} (λx_2.x_2 x_2) x_1, [x_1 ← x_0][x_0 ← δ]\\ \leadsto_β x_2 x_2, [x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} x_1 x_2, [x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} x_0 x_2, [x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ \leadsto_{SUB} (λx_3.x_3 x_3) x_2, [x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ \leadsto_β x_3 x_3, [x_3 ← x_2][x_2 ← x_1][x_1 ← x_0][x_0 ← δ]\\ ⋯\]

Subterm Invariant

The “equivalent” of the Hauptsatz in sequent calculus, or the subformula property.

Lemma (Subterm Invariant): Let $ρ: (\overline{t_0}, ε, ε) \leadsto^\ast (\overline u, π, E)$ be an execution. Then $u$ and any code in $E$ and $π$ are subterms of the $t_0$ (up to $α$)

Proof: The only subtle proof step: $\leadsto_{SUB}$ (the only step where the machine duplicates): $u$ is duplicated.

This gives us a bound on the size of duplicated terms.

Recall

\[t_n ⟶_β^n r_n\\ \text{ where } \vert r_n \vert = Ω(2^n)\]

This lemma tells us: whenever

\[s \leadsto^n s'\]

as in each step you can only duplicate subterms, then the size of $s’$ is bounded by

\[\vert s' \vert ≤ (n+1) \vert t_0 \vert\]

⟹ there’s no size explosion wrt the number of steps.

Warning: it could happen that the number of steps is itself exponential ⟹ we have to make sure that the number of transitions is reasonable (from $λ$ to $M$, before even going from $M$ to $RAM$)

\[(u, E) \leadsto_{SEA, β}^k s ⟹ k ≤ \vert u \vert ≤ \vert t_0 \vert\]

by the subterm invariant.

But $\leadsto_{SUB}$ increases the size of the term: replaces a variable (size $1$) by a term (size $≥ 1$). But by the subterm invariant, this term is a subterm of $t_0$, so:

\[\vert ρ \vert_{SEA} ≤ \vert t_0 \vert + \vert ρ \vert_{SUB} \vert t_0 \vert = (\vert t_0 \vert + 1) \vert ρ \vert_{SUB} ≤ (\vert t_0 \vert + 1) \vert ρ \vert_β^2\]

Recall that that AM take care of SEA(rch), SUB(stitution), and NAMES. But SEA is quadratic wrt to SUB ⟹ we can afford not to take it into account, it doesn’t impact much the complexity. NAMES impact even less the complexity.

	Number of transitions
SEA	$(\vert t_0 \vert + 1) \vert ρ \vert_β^2$
$β$	$\vert ρ \vert_β$
SUB	$\vert ρ \vert_β^2$

With pointers, the SEA and the $β$ transitions take constant time.

As for SUB: if we implement the environments as lists, you don’t have constant time access. But if variables are pointers, we can access the substituted term for $x$ in the environment in constant time ⟹ so SUB is bounded by $\vert t_0 \vert$ (as $u$ is a subterm thereof).

Therefore:

	Number of transitions	Cost of single transition	Global cost
SEA	$(\vert t_0 \vert + 1) \vert ρ \vert_β^2$	$O(1)$	$O((\vert t_0 \vert + 1) \vert ρ \vert_β^2)$
$β$	$\vert ρ \vert_β$	$O(1)$	$O(\vert ρ \vert_β)$
SUB	$\vert ρ \vert_β^2$	$O(\vert t_0 \vert)$	$O((\vert t_0 \vert + 1) \vert ρ \vert_β^2)$

  digraph {
    rankdir=LR;
    λ -> MAM[label="|t₀||ρ|_β²"];
    MAM -> RAM -> TM;
    TM -> λ[label="linear"];
  }

Call-by-Value evaluation

We saw that there’s an efficient abstract machine to implement call-by-name evaluation. But reasonable cost models are not about finding such efficient machines.

\[TM \overset{\text{linear}}{⟶} λ_{det}\]

But $λ$-calculus is bigger, and there are terms where the evaluation strategy matters.

Ex: Duplicator:

\[\text{ CBV: } δ(Ix) ⟶ δ x ⟶ xx\\ \text{ CBN: } δ(Ix) ⟶ (Ix)(Ix) ⟶ x (Ix) ⟶ xx\]

⟹ CBN seems to be silly (we duplicate work), but we just showed that it is reasonable.

Ex: Erasor:

\[\text{ CBV: } (λz.y)(δδ) ⟶ (λz.y)(δδ) ⟶ ⋯\\ \text{ CBN: } (λz.y)(δδ) ⟶ y\]

⟹ CBV seems to be silly, but we can show that it is reasonable too.

Being reasonable has nothing to do with finding an efficient strategy. It means that the overhead is not too complex (polynomial) ⟹ relative efficiency.

But still, are there non reasonable strategies? Yes (there’s one example in the literature: Jean-Jacques Lévy’s one).

Is there an optimal strategy (that takes the least number of steps)? No, the optimal strategy is not recursive.

But even though it is not recursive, we can have a notion of parallel optimal strategy, which is recursive (shown by Lévy). But Lévy didn’t know how to implement it. It was done a few years later by someone else. Question that arose: can take $k$ (the minimal number of steps for this optimal parallel strategy) as the complexity of $t$? It was proven that no, it’s not reasonable (this is an example of unreasonable strategy).

But it doesn’t make it useless for all that: just because it’s not reasonable doesn’t mean that it’s not efficient (hidden but wrong assumption: steps count as 1, i.e. there are reasonable). Nowadays, we still don’t know if it’s efficient or not.

Comparing the number of steps of strategies that are not reasonable doesn’t make sense.

Weak Call-by-Value (CBV) $λ$-calculus

\[v \; ≝ \; x \; \mid \; λx. t\] \[\cfrac{}{(λx.t)\underline v ⟶_{wβ_v} t \lbrace x ← v\rbrace}\] \[\cfrac{t ⟶_{wβ_v} t'}{tu ⟶_{wβ_v} t'u} \qquad \cfrac{t ⟶_{wβ_v} t'}{ut ⟶_{wβ_v} ut'}\]

NB: we consider only the weak version (we don’t reduce under $λ$’s) because generally it’s used with a programming application in mind

Harmony property: if $t$ is closed, then

either $t ⟶_{wβ_v} t'$

or $t$ is a value (an abstraction)

Proof: by induction on $t$.

So if $t$ is closed:

either $t$ reduces to another term
or it diverges

NB: if we remove variables from values, nothing changes.

In theoretical papers, values are defined as

\[v_t \; ≝ \; x \; \mid \; λx. t\]

In papers about abstract machines (practical), values are defined as

\[v_p \; ≝ \; λx. t\]

⟹ Why nothing changes? Because a variable can never be arguments: the only way to have a subterm of the form $tx$ is to bind the variable: $λx.tx$, and then we’re stuck: we don’t reduce under abstractions.

This is better than confluent, it has the diamond property:

\[(\underline{Iδ})(\underline{δI}) ⟶ δ(δI) ⟶ δ(II)\\ (\underline{Iδ})(\underline{δI}) ⟶ (Iδ)(II) ⟶ δ(II)\\\]

⟹ we can always close the diagram in one step.

It fails when we’re allowed to duplicate redexes, as in $δ(II)$. But in CBV, we can’t: the only terms we can duplicate are values, that are normal.

On top of that, all reduction sequences have the same length.

Right-to-left strategy (the only rule that changes):

\[\cfrac{t ⟶_{wβ_v} t'}{tv ⟶_{wβ_v} t'v}\]

So contexts are given by:

\[R \; ≝ \; ⟨⟩ \; \mid \; u R \; \mid \; R v\]

AM:

\[R ⟨(λx.t)v⟩ \mid E \leadsto_{β_v} R ⟨t⟩ \mid [x ← v] :: E\\ R ⟨x⟩ \mid E_1 :: [x ← v] :: E_2 \leadsto_{SUB} R ⟨v^α⟩ \mid E_1 :: [x ← v] :: E_2\]

We can rewrite the last ones in the same fashion, where $A$ is an applicative context:

Code	Environment		Code	Environment
$A⟨(λx.t)⟩$	$E$	$\leadsto_β$	$A⟨t⟩$	$[x ← u] :: E$
$A⟨x⟩$	$E_1 :: [x ← u] :: E_2$	$\leadsto_{SUB}$	$A⟨u^α⟩$	$E_1 :: [x ← u] :: E_2$

Share on

Twitter Facebook Google+ LinkedIn