Lecture 9: Cost models
Teacher: Beniamino Accatoli
Cost models
Turing Machines ( TM ) cost models:
 Time: number of transitions
 Space: max number of cells on the tape used during the computation
Way trickier for $λ$calculus! Possible attempt:
 Time: number $β$steps to normal form
 Space: max size of a term during evaluation
But not sufficient.
For time: most computational models are related by a polynomial factor ⟶ the class $P$ is robust, that’s why it’s so popular (we can talk about $P$ without specifying the model). And $P$ is the smallest such class (ex: TM vs Random Access Machine: a linear time problem in one is not linear in the other).
$λ$calculus: polynomially related to TM with respect to time, but what about space? ⟶ we won’t focus on space for the moment. Even for time, there are quite a bunch of problems:
 which strategy reduction gives a reasonable cost model?
 Atomicity of $β$: is $β$reduction an atomic operation (as for TM, which are close to realworld machines)? Not clear ⟶ $β$ makes a lot of copies. According to certain strategies, the answer can be “yes”, but it’s subtle.
Firstorder vs Higherorder computations
$λ$ vs TM:
 TM → firstorder: things that manipulate concrete data (take data as input, and yields an output)

$λ$ → higherorder: $λ$ can take programs as input ⟹ higherorder
 TM can execute other TM, but modulo contrived encodings
From TM to $λ$: easy From $λ$ to TM: hard (atomicity of $β$)
Equivalence $λ$ and TM
From TM to $λ$
Equivalence with TM: we don’t need the full $λ$calculus to represent TM. It’s enough to restrict ourselves to a fragment: the deterministic $λ$calculus.
Deterministic $λ$calculus: $Λ_{det}$
 Terms:

t \; ≝ \; v \; \mid \; t v\\ v \; ≝ \; x \; \mid \; λx.t
 Rewriting rules:

\cfrac{}{(λx.t)v ⟶_β t \lbrace x ← t\rbrace} β\cfrac{t ⟶ t'}{t v ⟶ t' v} @_L
Contexts:
or another to state it, evaluation contexts:
so that
NB: Here, we have a CPS ($β$reduction when the argument is a value) and weak $λ$calculus
 Weak $λ$calculus:

we don’t reduce under $λ$’s
 Strong:

we can reduce everywhere
NB:
 initially, $λ$calculus was introduced as strong, but in practice, people use the weak one.
 deterministic: at most one redex in a term
Thm: Let $Σ$ be an alphabet, $f: Σ ⟶ Σ^\ast$ a function computed by a TM in time $g$. Then there exists an encoding $\overline{()}$ from
 $Σ$
 strings over $Σ$
 TM over $Σ$
into $Λ_{det}$ st ∀ s ∈ Σ^\ast, \quad \overline{M} \overline{s} ⟶^n_{β_det} \overline{f(s)} \quad \text{ where } n = Θ(g(\vert s \vert) + \vert s \vert)
NB: from TM to $λ$ ⟶ there’s only a linear overhead
Now, the other direction.
From $λ$ to TM
Notation: $δ \; ≝ \; λx. xx$ (the duplicator), $Ω \; ≝ \; δδ$
Let
 $t_0 \; ≝ \; y$
 $t_{n+1} \; ≝ \; δ t_n$
(we’ll use the rightmost reduction strategy)
So the normal forms:
 $r_0 = y$
 $r_{n+1} = n_n n_n$
but
So in a linear number of steps, we get an exponential size ⟹ impossible to execute on a TM. The complexity $n$ (number of steps in rightmost $β$reduction ($rβ$)) doesn’t account for the time to write down the result (at least one the $β$steps makes an exponential work ⇒ it can’t be atomic)
Seems to show that the number of $β$ steps in $λ$ cannot be taken as a good measure of complexity.
I doesn’t work with closed terms either. Consider:
 $u_0 \; ≝ \; I$
 $u_{n+1} \; ≝ \; (λx. λy. yxx) t_n$
(resists to strong evaluation too! Otherwise, for weak evaluation, we could use $(λx. λy. xx)$ too)
For leftmost evaluation:
(by induction)
Now, try to come up with a sequence (possibly open) such that the leftmost evaluation doesn’t terminate.
 $t’_1 \; ≝ \; δ$
 $t’_{n+1} \; ≝ \; λx. (t’_n (xx))$
⟶ $t’_n y$ explodes for leftmost $β$reduction
so
Actually, it explodes with respect to any strategy! We could even come up with a close sequence (that explodes wrt any evaluation) ⟹ it’s unavoidable.
 From TM to $λ$: linear
 From $λ$ to TM: Size explosion, exponential
⟶ To fix this: concept of sharing: from $λ_x$ to $λ_{shx}$, explicit substitutions (grows linearly with the number of $β$steps)
Number of steps i $λ_x$ ⟶ measure of complexity (but there’s a catch: if you want to expand explicit substitutions, there’s an exponential blowup)
Unfolding:
Size explosion when expanding substitutions: $t$ with explicit substitutions (ES):
$\vert t ↓ \vert = Ω(2^{\vert t \vert})$$
 $t_0 = x_0 x_0$
 $t_{n+1} = t_n[x_n ← x_{n+1} x_{n+1}]$
Still, why is it not cheating? Because you can test equality without expanding the substitutions:
can be tested in $O(\vert t \vert + \vert s \vert)$
Actually, there’s a hidden assumption in the size explosion issue: “space = size of the term” (this what was problematic). But with ES and sharing, space is no longer the size of the term.
Abstract Machines (AM)
In the following, we will stick to:
Simplest reduction strategy ever: weak head evaluation
Abstract machines: definition
What is an abstract machine? No absolute definition (“machine” because they operationally implement $λ$calculus, “abstract” because we don’t care about garbage collection, etc…)
Abstract machines make explicit what happens at a metalevel in $λ$calculus:
 Search of the next redex to reduce
 Substitution

Names: $α$equivalence (to avoid capture, etc…)
 For the sake of simplicity, we won’t care/focus too much about it
Search of an AM for weak head reduction
Let’s study an AM that makes explicit only Search.
code: $ts$ / stack $π$ $\overset{\text{applied rule}}{⟹_{@L}}$ code: $t$ / stack: $s::π$
code: $λx. t$ / stack $s::π$ $⟹_β$ code: $t \lbrace x ← s\rbrace$ / stack: $π$
 An AM:

is given by a grammar for states $s$ and a transition function $⟹$ ($⟹ \; = \; ⟹_β$ or $⟹_o$ (for $⟹_@$))
 State:

some code plus some datastructure
 Codes:

overlined (like $\overline{t}$) to stress that they are not considered up to $α$equivalence
 A code $\overline{t}$ is wellnamed:

if whenever $λx. \overline{s}$ occurs in $t$, then $x$ occurs only in $\overline{s}$, if at all
Ex:
 Not wellnamed: $x(λx. xy)$
 Wellnamed variant: $x(λz. zy)$
 Not wellnamed: $(λx.xx)(λx.xx)$
 Wellnamed variant: $(λx.xx)(λy. yy)$
 A state is initial:

if its code is wellnamed, and the datastructures are empty.
There is a bijection (up to $α$) $()^°$ between terms and initial states, called compilation, sending $t$ to the initial state $t^°$.
Execution:
As it happens, $s$ is said to be reachable.
 Final state:

if no transition apply.
There’s a decoding function: \underline{()}: \text{States} ⟶ λ\text{terms}
 The AM $M$ implements the strategy:

if

ex to der: ∀ f: t^° ⟹^\ast s, ∃ d: t ⟶^\ast \underline{s}

der to ex: ∀ d: t ⟶^\ast u, ∃ f: t^° ⟶^\ast s; \; \underline{s} = u
Moreover, in both cases we ask that
\vert f \vert_β = \vert d \vert 
Contexts:
Properties we have
1) $β$transition:
\text{if } \quad s ⟹_β s', \quad \text{ then } \quad \underline{s} ⟶_{wh} \underline{s'}
2) Overhead transparency:
\text{if } \quad s ⟹_o s', \quad \text{ then } \quad \underline{s} = \underline{s'}
 $⟹_o$ terminates
 the machine is deterministic
 final states decode to normal forms
Proof of the last one: Final states are of the form
From these, we get the property:
which enables us to prove der to ex.
Leave a comment