Lecture 9: Cost models

Teacher: Beniamino Accatoli

Cost models

Turing Machines ( TM ) cost models:

  • Time: number of transitions
  • Space: max number of cells on the tape used during the computation

Way trickier for $λ$-calculus! Possible attempt:

  • Time: number $β$-steps to normal form
  • Space: max size of a term during evaluation

But not sufficient.

For time: most computational models are related by a polynomial factor ⟶ the class $P$ is robust, that’s why it’s so popular (we can talk about $P$ without specifying the model). And $P$ is the smallest such class (ex: TM vs Random Access Machine: a linear time problem in one is not linear in the other).

$λ$-calculus: polynomially related to TM with respect to time, but what about space? ⟶ we won’t focus on space for the moment. Even for time, there are quite a bunch of problems:

  • which strategy reduction gives a reasonable cost model?
  • Atomicity of $β$: is $β$-reduction an atomic operation (as for TM, which are close to real-world machines)? Not clear ⟶ $β$ makes a lot of copies. According to certain strategies, the answer can be “yes”, but it’s subtle.

First-order vs Higher-order computations

$λ$ vs TM:

  • TM → first-order: things that manipulate concrete data (take data as input, and yields an output)
  • $λ$ → higher-order: $λ$ can take programs as input ⟹ higher-order

    • TM can execute other TM, but modulo contrived encodings

From TM to $λ$: easy From $λ$ to TM: hard (atomicity of $β$)

Equivalence $λ$ and TM

From TM to $λ$

Equivalence with TM: we don’t need the full $λ$-calculus to represent TM. It’s enough to restrict ourselves to a fragment: the deterministic $λ$-calculus.

Deterministic $λ$-calculus: $Λ_{det}$

Terms:
t \; ≝ \; v \; \mid \; t v\\ v \; ≝ \; x \; \mid \; λx.t
Rewriting rules:
\cfrac{}{(λx.t)v ⟶_β t \lbrace x ← t\rbrace} β
\cfrac{t ⟶ t'}{t v ⟶ t' v} @_L

Contexts:

(λx. t)v v_1 ⋯ v_k ⟶ t \lbrace x ← v \rbrace v_1 ⋯ v_k

or another to state it, evaluation contexts:

E \; ≝ \; ⟨\cdot⟩ \; \mid \; E v
⟨\cdot⟩ ⟨t⟩ \; ≝ \; t\\ (Ev) ⟨t⟩ \; ≝ \; E⟨t⟩ v

so that

\cfrac{t ⟶_β t'}{E⟨t⟩ ⟶_β E⟨t'⟩}

NB: Here, we have a CPS ($β$-reduction when the argument is a value) and weak $λ$-calculus

Weak $λ$-calculus:

we don’t reduce under $λ$’s

Strong:

we can reduce everywhere

NB:

  • initially, $λ$-calculus was introduced as strong, but in practice, people use the weak one.
  • deterministic: at most one redex in a term

Thm: Let $Σ$ be an alphabet, $f: Σ ⟶ Σ^\ast$ a function computed by a TM in time $g$. Then there exists an encoding $\overline{(-)}$ from

  • $Σ$
  • strings over $Σ$
  • TM over $Σ$

into $Λ_{det}$ st ∀ s ∈ Σ^\ast, \quad \overline{M} \overline{s} ⟶^n_{β_det} \overline{f(s)} \quad \text{ where } n = Θ(g(\vert s \vert) + \vert s \vert)

NB: from TM to $λ$ ⟶ there’s only a linear overhead

Now, the other direction.

From $λ$ to TM

Notation: $δ \; ≝ \; λx. xx$ (the duplicator), $Ω \; ≝ \; δδ$

Let

  • $t_0 \; ≝ \; y$
  • $t_{n+1} \; ≝ \; δ t_n$

(we’ll use the rightmost reduction strategy)

t_1 = δ y ⟶ yy\\ t_2 = δ(δ y) ⟶^\ast (yy)(yy)\\ t_3 ⟶^\ast ((yy)(yy))((yy)(yy))\\ \vdots

So the normal forms:

t_n ⟶^n_{rβ} r_n
  • $r_0 = y$
  • $r_{n+1} = n_n n_n$
\vert r_n \vert = Ω(2^n)

but

\vert t_n \vert = O(n)

So in a linear number of steps, we get an exponential size ⟹ impossible to execute on a TM. The complexity $n$ (number of steps in rightmost $β$-reduction ($rβ$)) doesn’t account for the time to write down the result (at least one the $β$-steps makes an exponential work ⇒ it can’t be atomic)

Seems to show that the number of $β$ steps in $λ$ cannot be taken as a good measure of complexity.

I doesn’t work with closed terms either. Consider:

  • $u_0 \; ≝ \; I$
  • $u_{n+1} \; ≝ \; (λx. λy. yxx) t_n$

(resists to strong evaluation too! Otherwise, for weak evaluation, we could use $(λx. λy. xx)$ too)

For leftmost evaluation:

t_n ⟶_{lβ}^{2^n - 1} r_n

(by induction)

Now, try to come up with a sequence (possibly open) such that the leftmost evaluation doesn’t terminate.

  • $t’_1 \; ≝ \; δ$
  • $t’_{n+1} \; ≝ \; λx. (t’_n (xx))$

⟶ $t’_n y$ explodes for leftmost $β$-reduction

t'_{n+1}y = (λx. t'_n (xx))y ⟶ t'_n (yy) \\ ⟶ (λx. t'_{n-1} (xx))(yy) ⟶ ⋯
t'_n r_m ⟶^n r_{n+m}

so

t'_n y = t'_n r_0 ⟶^n r_m

Actually, it explodes with respect to any strategy! We could even come up with a close sequence (that explodes wrt any evaluation) ⟹ it’s unavoidable.


  • From TM to $λ$: linear
  • From $λ$ to TM: Size explosion, exponential

⟶ To fix this: concept of sharing: from $λ_x$ to $λ_{shx}$, explicit substitutions (grows linearly with the number of $β$-steps)

Number of steps i $λ_x$ ⟶ measure of complexity (but there’s a catch: if you want to expand explicit substitutions, there’s an exponential blow-up)

Unfolding:

x↓ \; ≝ \; x\\ (λx. t) ↓ \; ≝ \; λx. t↓\\ (ts)↓ \; ≝ \; t↓ s↓\\ (t[x ← s])↓ \; ≝ \; t↓ \lbrace x ← s↓\rbrace

Size explosion when expanding substitutions: $t$ with explicit substitutions (ES):

$\vert t ↓ \vert = Ω(2^{\vert t \vert})$$

  • $t_0 = x_0 x_0$
  • $t_{n+1} = t_n[x_n ← x_{n+1} x_{n+1}]$

Still, why is it not cheating? Because you can test equality without expanding the substitutions:

t ↓ \; \overset{?}{ = } \; ↓s

can be tested in $O(\vert t \vert + \vert s \vert)$

Actually, there’s a hidden assumption in the size explosion issue: “space = size of the term” (this what was problematic). But with ES and sharing, space is no longer the size of the term.

Abstract Machines (AM)

In the following, we will stick to:

Simplest reduction strategy ever: weak head evaluation

\cfrac{}{(λx.t)s ⟶_{wh} t \lbrace x ← s\rbrace} β
\cfrac{t ⟶_{wh} t'}{ts ⟶ t's} @L

Abstract machines: definition

What is an abstract machine? No absolute definition (“machine” because they operationally implement $λ$-calculus, “abstract” because we don’t care about garbage collection, etc…)

Abstract machines make explicit what happens at a meta-level in $λ$-calculus:

  • Search of the next redex to reduce
  • Substitution
  • Names: $α$-equivalence (to avoid capture, etc…)

    • For the sake of simplicity, we won’t care/focus too much about it

Search of an AM for weak head reduction

Let’s study an AM that makes explicit only Search.

code: $ts$ / stack $π$ $\overset{\text{applied rule}}{⟹_{@L}}$ code: $t$ / stack: $s::π$

code: $λx. t$ / stack $s::π$ $⟹_β$ code: $t \lbrace x ← s\rbrace$ / stack: $π$

An AM:

is given by a grammar for states $s$ and a transition function $⟹$ ($⟹ \; = \; ⟹_β$ or $⟹_o$ (for $⟹_@$))

State:

some code plus some data-structure

Codes:

overlined (like $\overline{t}$) to stress that they are not considered up to $α$-equivalence

A code $\overline{t}$ is well-named:

if whenever $λx. \overline{s}$ occurs in $t$, then $x$ occurs only in $\overline{s}$, if at all

Ex:

  • Not well-named: $x(λx. xy)$
    • Well-named variant: $x(λz. zy)$
  • Not well-named: $(λx.xx)(λx.xx)$
    • Well-named variant: $(λx.xx)(λy. yy)$
A state is initial:

if its code is well-named, and the data-structures are empty.

There is a bijection (up to $α$) $(-)^°$ between terms and initial states, called compilation, sending $t$ to the initial state $t^°$.

Execution:

t_0^° ⟹^\ast s

As it happens, $s$ is said to be reachable.

Final state:

if no transition apply.

There’s a decoding function: \underline{(-)}: \text{States} ⟶ λ\text{-terms}

The AM $M$ implements the strategy:

if

  • ex to der: ∀ f: t^° ⟹^\ast s, ∃ d: t ⟶^\ast \underline{s}

  • der to ex: ∀ d: t ⟶^\ast u, ∃ f: t^° ⟶^\ast s; \; \underline{s} = u

Moreover, in both cases we ask that

\vert f \vert_β = \vert d \vert
t^° \; ≝ \; \overline{t}/ε
\underline{t/ε} \; ≝ \; t\\ \underline{t/s::π} \; ≝ \; \underline{ts/π}

Contexts:

\underline{ε} \; ≝ \; ⟨\cdot⟩\\ \underline{t::π} \; ≝ \; \underline{π} (⟨\cdot⟩ t)\\ \underline{t/π} \; ≝ \; \underline{π} ⟨t⟩

Properties we have

1) $β$-transition:

\text{if } \quad s ⟹_β s', \quad \text{ then } \quad \underline{s} ⟶_{wh} \underline{s'}

2) Overhead transparency:

\text{if } \quad s ⟹_o s', \quad \text{ then } \quad \underline{s} = \underline{s'}
  1. $⟹_o$ terminates
  1. the machine is deterministic
  1. final states decode to normal forms

Proof of the last one: Final states are of the form

λx.t/ε \text{ which decodes into } λx.t \\ x / π \text{ which decodes into } x u_1 ⋯ u_n

From these, we get the property:

\text{If }\underline{s} ⟶_{wh} t \\ \text{ then } s ⟹^\ast_o ⟹_β s' \qquad \text{ where } \underline{s'} = t

which enables us to prove der to ex.

Leave a comment