# Lecture 9: Cost models

Teacher: Beniamino Accatoli

## Cost models

Turing Machines ( TM ) cost models:

• Time: number of transitions
• Space: max number of cells on the tape used during the computation

Way trickier for $λ$-calculus! Possible attempt:

• Time: number $β$-steps to normal form
• Space: max size of a term during evaluation

But not sufficient.

For time: most computational models are related by a polynomial factor ⟶ the class $P$ is robust, that’s why it’s so popular (we can talk about $P$ without specifying the model). And $P$ is the smallest such class (ex: TM vs Random Access Machine: a linear time problem in one is not linear in the other).

$λ$-calculus: polynomially related to TM with respect to time, but what about space? ⟶ we won’t focus on space for the moment. Even for time, there are quite a bunch of problems:

• which strategy reduction gives a reasonable cost model?
• Atomicity of $β$: is $β$-reduction an atomic operation (as for TM, which are close to real-world machines)? Not clear ⟶ $β$ makes a lot of copies. According to certain strategies, the answer can be “yes”, but it’s subtle.

### First-order vs Higher-order computations

$λ$ vs TM:

• TM → first-order: things that manipulate concrete data (take data as input, and yields an output)
• $λ$ → higher-order: $λ$ can take programs as input ⟹ higher-order

• TM can execute other TM, but modulo contrived encodings

From TM to $λ$: easy From $λ$ to TM: hard (atomicity of $β$)

# Equivalence $λ$ and TM

## From TM to $λ$

Equivalence with TM: we don’t need the full $λ$-calculus to represent TM. It’s enough to restrict ourselves to a fragment: the deterministic $λ$-calculus.

### Deterministic $λ$-calculus: $Λ_{det}$

Terms:
$t \; ≝ \; v \; \mid \; t v\\ v \; ≝ \; x \; \mid \; λx.t$
Rewriting rules:
$\cfrac{}{(λx.t)v ⟶_β t \lbrace x ← t\rbrace} β$ $\cfrac{t ⟶ t'}{t v ⟶ t' v} @_L$

Contexts:

$(λx. t)v v_1 ⋯ v_k ⟶ t \lbrace x ← v \rbrace v_1 ⋯ v_k$

or another to state it, evaluation contexts:

$E \; ≝ \; ⟨\cdot⟩ \; \mid \; E v$ $⟨\cdot⟩ ⟨t⟩ \; ≝ \; t\\ (Ev) ⟨t⟩ \; ≝ \; E⟨t⟩ v$

so that

$\cfrac{t ⟶_β t'}{E⟨t⟩ ⟶_β E⟨t'⟩}$

NB: Here, we have a CPS ($β$-reduction when the argument is a value) and weak $λ$-calculus

Weak $λ$-calculus:

we don’t reduce under $λ$’s

Strong:

we can reduce everywhere

NB:

• initially, $λ$-calculus was introduced as strong, but in practice, people use the weak one.
• deterministic: at most one redex in a term

Thm: Let $Σ$ be an alphabet, $f: Σ ⟶ Σ^\ast$ a function computed by a TM in time $g$. Then there exists an encoding $\overline{(-)}$ from

• $Σ$
• strings over $Σ$
• TM over $Σ$

into $Λ_{det}$ st $∀ s ∈ Σ^\ast, \quad \overline{M} \overline{s} ⟶^n_{β_det} \overline{f(s)} \quad \text{ where } n = Θ(g(\vert s \vert) + \vert s \vert)$

NB: from TM to $λ$ ⟶ there’s only a linear overhead

Now, the other direction.

## From $λ$ to TM

Notation: $δ \; ≝ \; λx. xx$ (the duplicator), $Ω \; ≝ \; δδ$

Let

• $t_0 \; ≝ \; y$
• $t_{n+1} \; ≝ \; δ t_n$

(we’ll use the rightmost reduction strategy)

$t_1 = δ y ⟶ yy\\ t_2 = δ(δ y) ⟶^\ast (yy)(yy)\\ t_3 ⟶^\ast ((yy)(yy))((yy)(yy))\\ \vdots$

So the normal forms:

$t_n ⟶^n_{rβ} r_n$
• $r_0 = y$
• $r_{n+1} = n_n n_n$
$\vert r_n \vert = Ω(2^n)$

but

$\vert t_n \vert = O(n)$

So in a linear number of steps, we get an exponential size ⟹ impossible to execute on a TM. The complexity $n$ (number of steps in rightmost $β$-reduction ($rβ$)) doesn’t account for the time to write down the result (at least one the $β$-steps makes an exponential work ⇒ it can’t be atomic)

Seems to show that the number of $β$ steps in $λ$ cannot be taken as a good measure of complexity.

I doesn’t work with closed terms either. Consider:

• $u_0 \; ≝ \; I$
• $u_{n+1} \; ≝ \; (λx. λy. yxx) t_n$

(resists to strong evaluation too! Otherwise, for weak evaluation, we could use $(λx. λy. xx)$ too)

For leftmost evaluation:

$t_n ⟶_{lβ}^{2^n - 1} r_n$

(by induction)

Now, try to come up with a sequence (possibly open) such that the leftmost evaluation doesn’t terminate.

• $t’_1 \; ≝ \; δ$
• $t’_{n+1} \; ≝ \; λx. (t’_n (xx))$

⟶ $t’_n y$ explodes for leftmost $β$-reduction

$t'_{n+1}y = (λx. t'_n (xx))y ⟶ t'_n (yy) \\ ⟶ (λx. t'_{n-1} (xx))(yy) ⟶ ⋯$ $t'_n r_m ⟶^n r_{n+m}$

so

$t'_n y = t'_n r_0 ⟶^n r_m$

Actually, it explodes with respect to any strategy! We could even come up with a close sequence (that explodes wrt any evaluation) ⟹ it’s unavoidable.

• From TM to $λ$: linear
• From $λ$ to TM: Size explosion, exponential

⟶ To fix this: concept of sharing: from $λ_x$ to $λ_{shx}$, explicit substitutions (grows linearly with the number of $β$-steps)

Number of steps i $λ_x$ ⟶ measure of complexity (but there’s a catch: if you want to expand explicit substitutions, there’s an exponential blow-up)

Unfolding:

$x↓ \; ≝ \; x\\ (λx. t) ↓ \; ≝ \; λx. t↓\\ (ts)↓ \; ≝ \; t↓ s↓\\ (t[x ← s])↓ \; ≝ \; t↓ \lbrace x ← s↓\rbrace$

Size explosion when expanding substitutions: $t$ with explicit substitutions (ES):

$\vert t ↓ \vert = Ω(2^{\vert t \vert})$$•$t_0 = x_0 x_0$•$t_{n+1} = t_n[x_n ← x_{n+1} x_{n+1}]$Still, why is it not cheating? Because you can test equality without expanding the substitutions: $t ↓ \; \overset{?}{ = } \; ↓s$ can be tested in$O(\vert t \vert + \vert s \vert)$Actually, there’s a hidden assumption in the size explosion issue: “space = size of the term” (this what was problematic). But with ES and sharing, space is no longer the size of the term. # Abstract Machines (AM) In the following, we will stick to: ### Simplest reduction strategy ever: weak head evaluation $\cfrac{}{(λx.t)s ⟶_{wh} t \lbrace x ← s\rbrace} β$ $\cfrac{t ⟶_{wh} t'}{ts ⟶ t's} @L$ ### Abstract machines: definition What is an abstract machine? No absolute definition (“machine” because they operationally implement$λ$-calculus, “abstract” because we don’t care about garbage collection, etc…) Abstract machines make explicit what happens at a meta-level in$λ$-calculus: • Search of the next redex to reduce • Substitution • Names:$α$-equivalence (to avoid capture, etc…) • For the sake of simplicity, we won’t care/focus too much about it ## Search of an AM for weak head reduction Let’s study an AM that makes explicit only Search. code:$ts$/ stack$π\overset{\text{applied rule}}{⟹_{@L}}$code:$t$/ stack:$s::π$code:$λx. t$/ stack$s::π⟹_β$code:$t \lbrace x ← s\rbrace$/ stack:$π$An AM: is given by a grammar for states$s$and a transition function$⟹$($⟹ \; = \; ⟹_β$or$⟹_o$(for$⟹_@$)) State: some code plus some data-structure Codes: overlined (like$\overline{t}$) to stress that they are not considered up to$α$-equivalence A code$\overline{t}$is well-named: if whenever$λx. \overline{s}$occurs in$t$, then$x$occurs only in$\overline{s}$, if at all Ex: • Not well-named:$x(λx. xy)$• Well-named variant:$x(λz. zy)$• Not well-named:$(λx.xx)(λx.xx)$• Well-named variant:$(λx.xx)(λy. yy)$A state is initial: if its code is well-named, and the data-structures are empty. There is a bijection (up to$α$)$(-)^°$between terms and initial states, called compilation, sending$t$to the initial state$t^°$. Execution: $t_0^° ⟹^\ast s$ As it happens,$s$is said to be reachable. Final state: if no transition apply. There’s a decoding function: $\underline{(-)}: \text{States} ⟶ λ\text{-terms}$ The AM$M$implements the strategy: if • ex to der: $∀ f: t^° ⟹^\ast s, ∃ d: t ⟶^\ast \underline{s}$ • der to ex: $∀ d: t ⟶^\ast u, ∃ f: t^° ⟶^\ast s; \; \underline{s} = u$ Moreover, in both cases we ask that $\vert f \vert_β = \vert d \vert$ $t^° \; ≝ \; \overline{t}/ε$ $\underline{t/ε} \; ≝ \; t\\ \underline{t/s::π} \; ≝ \; \underline{ts/π}$ Contexts: $\underline{ε} \; ≝ \; ⟨\cdot⟩\\ \underline{t::π} \; ≝ \; \underline{π} (⟨\cdot⟩ t)\\ \underline{t/π} \; ≝ \; \underline{π} ⟨t⟩$ ### Properties we have 1)$β$-transition: $\text{if } \quad s ⟹_β s', \quad \text{ then } \quad \underline{s} ⟶_{wh} \underline{s'}$ 2) Overhead transparency: $\text{if } \quad s ⟹_o s', \quad \text{ then } \quad \underline{s} = \underline{s'}$ 1.$⟹_o\$ terminates
1. the machine is deterministic
1. final states decode to normal forms

Proof of the last one: Final states are of the form

$λx.t/ε \text{ which decodes into } λx.t \\ x / π \text{ which decodes into } x u_1 ⋯ u_n$

From these, we get the property:

$\text{If }\underline{s} ⟶_{wh} t \\ \text{ then } s ⟹^\ast_o ⟹_β s' \qquad \text{ where } \underline{s'} = t$

which enables us to prove der to ex.

Tags:

Updated: