Lecture 1: Operational semantics and reduction strategies

Teacher: François Pottier

Functional Programming (FP): Introduction

G. Béry: « Machines are wondeful tools to amplify your mistakes. »

What is FP? A certain family of programming languages. In the 80’s, they were set apart from other languages, seen as “academic”. But nowadays, the disctinction is less clear, they’re extensively used in the industry as well.

Features:

• Mutable states is discouraged (avoid changing values of variables (as in mathematics))
• Higher-order functions
• Type discipline/system to avoid doing basic mistakes that would make the program crash.
• Some unsafe operations are taken away from the programmer (ex: memory de-allocation: taken care of by garbage collector)
• Close to mathematics (Haskell enthusiasts like to think of programs as being mathematical entities, rather doing things)

A C programmer would think of programs as memory blocks and pointers, whereas a functional programmer think with algebraic data structures.

Loops are replaced by tail-recursive functions. Tail calls are cheap (same as doing a GOTO).

Today: functional programming not that different from “mainstream” programming, it’s a culture in and by itself.

Operational semantics and reduction strategies

Two ways to regard programs:

• Denotational semantics: What mathematical function corresponds to a given $λ$-term? Define a suitable notion of space (Scott domains, etc…)
• Operational semantics: see a program as an operational “machine”

Reduction strategies in $λ$-calculus

Call-by-value

Compute the value of the argument before calling the function.

Values:

variables and $λ$-abstraction

Call-by-value reduction $t ⟶_{cbv} t’$ inductively defined. Small-step operational semantics:

$\cfrac{}{(λx.t)v ⟶_{cbv} t[v/x]}\\ \, \\ \cfrac{t ⟶_{cbv} t'}{tu ⟶_{cbv} t'u}\\ \, \\ \cfrac{u ⟶_{cbv} u'}{vu ⟶_{cbv} vu'}$

Besides, we only reduce closed terms.

But warning: we can’t reduce under a $λ$-abstraction! (Thus, values don’t reduce)

Ex:

• Coq, Agda do reduce under $λ$’s (it’s the case when you want to do mathematical reasoning), contrary to C, Java, Python, OCamL, …
• Haskell has a lazy evaluation strategy, unlike call-by-value

Left-to-right evaluation (ex: not like in C): in $tu$, $t$ must be reduced to a value before $u$.

If $t ⟶_{cbv} u, u’$, then $u = u’$.

Then, there are 3 mutually exclusive situations:

1. either a program terminates
2. or it diverges

• ex: $Ω ≝ δ δ$
3. or it yields a runtime error: when a term can’t be reduced anymore (the term is said to “go wrong”).

• not happening in $λ$-calculus because it’s too “small/simple”
• $δ 2 ⟶{cbv} 22 \not⟶{cbv}$

Strong type systems rule out errors. Some type systems rule out both errors and divergence (as Coq and Agda, as a divergent program could have any type, and thus could be of type $⊥$)

Alternative style: evaluation contexts

$(λx.t)v ⟶_{cbv}^{head} t[v/x]$

And then reduction as head reduction under an evaluation context:

$\cfrac{t ⟶_{cbv}^{head} t'}{E[t] ⟶_{cbv} E[t']}$

where

$E ≝ [] \,|\, E u \,|\, vE$

Call-by-name

$\cfrac{}{(λx.t)u ⟶_{cbv} t[u/x]}\\ \, \\ \cfrac{t ⟶_{cbv} t'}{tu ⟶_{cbv} t'u}$

The term $u$ represents a computation that has not yet been carried out

NB: we don’t have $\cfrac{t ⟶{cbv} t’}{tu ⟶{cbv} t’u}$ has it would defeat the very purpose of call-by-name (the argument of a function may be evaluated right away), and it would make things non-deterministic.

Reduced later whenever the function demands its value.

Th: if $t$ terminates under CBV, then it does under CBN too.

The converse is false: $(λx.1)Ω$ converges under CBN but diverges under CBV.

Not evaluating the argument

• may be a good thing, for ex if the function doesn’t use it
• but may backfire if it used several times in the function (you would need to use memoization, not compute the value several times)

Thunks

To embed CBN inside a CBV language.

Thunks:

function $λ_.u$ that delays the evaluation of the argument $u$

$⟦x⟧ = x()\\ ⟦λx.t⟧ = λx.⟦t⟧\\ ⟦tu⟧ = ⟦t⟧(λ\_.⟦u⟧)$
$⟦\bullet⟧$ is correct if:
$t ⟶^\ast_{cbn} v ⟹ ⟦t⟧ ⟶^\ast_{cbn} ⟦v⟧$

More visually:

$\begin{xy} \xymatrix{ \bullet \ar[r]^{cbn} \ar[d]_{⟦\bullet⟧} & B \ar@{.>}[d]^{⟦\bullet⟧} \\ \bullet \ar@{.>}[r]_{cbn} & \bullet } \end{xy}$
Typing judgment:
$\underbrace{Γ}_{\text{assumption}} ⊢ \underbrace{t}_{\text{term}} : \underbrace{T}_{\text{type}}$

This transformation $⟦\bullet⟧$ is type-preserving:

$Γ ⊢ t:T ⟹ ⟦Γ⟧ ⊢ ⟦t⟧: ⟦T⟧$

Type transformation:

$T ≝ int \,|\, T → T$

and

$⟦int⟧ ≝ int\\ ⟦T_1 → T_2⟧ ≝ \underbrace{(unit → ⟦T_1⟧)}_{≝ \, \texttt{thunk } T_1} → ⟦T_2⟧$

For a closed program of type int: $t: int ⟹ ⟦t⟧: int$

Call-by-need/Lazy evaluation (LE)

Introduces memoization to call-by-name to avoid repeated computations.

Ex: it is used in Haskell

This encourages a modular way of writing programs.

With LE, you can write programs that produce large chunks of data, and a consumer that consumes just a part of this data, and when combining it, the computations carried out are the ones needed by the consumer.

Ex: Square root computation in Haskell with Newton-Raphson iterations:

next n x = (x+n/x)/2
repeat f a = a : (repeat f (f a)) -- producer of infinite stream of numbers
within eps (a : b : rest) = -- consumer: decides how many elements to demand
if abs (a-b) <= eps then b
else within eps (b : rest) -- b is evaluated only only thanks to memoization
sqrt a0 eps n =
within eps (repeat (next n) a0)


No mutable state.

Other example: if you want to define a function that determines if an list element satisfies a certain property:

any: (a -> Bool) -> [a] -> Bool
any p xs = fold (or) (map p xs) false


this would be inefficient in call-by-value (we would apply p to all the list elements), whereas it’s okay in call-by-need: whenever an element satisfies p, the computation stops.

Memoization thunks

$⟦x⟧ = \texttt{force } x\\ ⟦λx.t⟧ = λx.⟦t⟧\\ ⟦tu⟧ = ⟦t⟧(\texttt{suspend }(λ\_.⟦u⟧))$

Can be implemented in OCamL: exercise.

Machine-checked proofs

To scale up to check proofs for real-world languages. Moreover, proofs in LaTeX are not maintainable and prone to mistakes.

Machanized theorem proving:

• the 4-color theorem (Gonthier, Werner)
• Feit-Thompson theorem (Gonthier et al.)
• Kepler’s conjecture (Hales et al.)

Proof assistants are perfectly suited to programming languages:

• discrete objects (no reals, analysis, topology)
• many similar proof cases
• syntactic techniques

Today, at POPL: 20% of the papers come with machine-checked proof.

The proof checker should be as tiny and reliable as possible.

Coq

Pure functional programming language in the style of ML, with recursive functions and pattern-matching:

Fixpoint factorial (n : nat) :=
match n with
| 0 => 1
| S p => n * factorial p
end.


Inductive types:

Inductive nat:Type :=
| 0: nat
| S: nat -> nat.


Inductive predicates: generated by repeated application of the constructor

Inductive even nat -> Prop :=
| even_zero : even 0
| enven_plus_2 : forall n, even n -> even (S (S n)).


NB: Inhabitants of even n can be thought of as derivation trees whose conclusion is even n.

Abstract syntax with binders

In most programming languages: there are constructs that bind variables:

• in terms:
• function abstraction: $λx.t$
• local definitions: let x=t in x
• in types:
• quantifiers: $∀α. α → α$

$α$-equivalence $≡_α$

$\binom{x}{y} x = y\\ \binom{x}{y} y = x\\ \binom{x}{y} z = z \quad \text{ if } z ≠ x, y\\ \binom{x}{y} (λz.t) = λ \binom{x}{y} z. \binom{x}{y} t\\ \binom{x}{y}(tu) = (\binom{x}{y}t)(\binom{x}{y} u)$

In papers, it’s common, by abuse of notation, to confuse $≡_α$ with equality $=$

Problem: Coq doesn’t support quotient types. There is “hack” to simulate them (Cohen, 2013), but it’s not very convenient.

Possible workaround: De Bruijn indices (but it’s a mixed blessing: simpler to formalize, but less human-readable):

Inductive term :=
| Var: nat -> term
| Lam: term -> term
| App: term -> term -> term.


Substitutions

Substitution $σ$:

a total function from variables to terms.

Substitutions can also be seen as infinite sequences

$σ(0) \cdot σ(1) \cdot ⋯$
Application of $σ$ to a term $t$:

$t[σ]$, defined inductively as:

$x[σ] = σ(x)\\ (λt)[σ] = λ(t[0 \cdot (σ; +1)])\\ (t_1 t_2)[σ] = t_1[σ] t_2[σ]$

And the composition should be mutually defined as:

$(σ_1; σ_2)(x) = (σ_1(x))[σ_2]$

NB: cannot be written like that in Coq (not well-founded!) ⟶ solution: define $0 \cdot (σ; +1)$ in another way (independently).

Completeness:

if you can write an equation with the given grammar, then it is either a consequence of a theory or the negation thereof is.

Explicit Substitutions

$t \; ≝ \; x \; \mid \; λx. e \; \mid \; tt \; \mid \; t[σ]$

And, define syntax for substitutions

Tags:

Updated: