Lecture 3: Closure conversion, Defunctionalization
Teacher: François Pottier
Closure conversion
Goal: compile a language with arbitrary first-class functions (i.e., $λ$-calculus) to a language with closed first-class functions (e.g. C, etc…)
In C: closures simulated/emulated by hand (sytems programming, etc…)
- compilation of functional programming languages
- explains first-class functions
- space and time cost;
- programming technique in languages without first-class functions (e.g. C).
let iter f t =
for i = 0 to Array.length t - 1 do f t.(i) done
let sum t =
let s = ref 0 in
let add x = (s := !s + x) in iter add t;
!s
⟶ How is this program transformed by a compiler without using local variables and nested functions?
⟶ the function add
could be returned as the output of a function
Procedural abstraction (in OOP): you always have to access the wanted through methods (direct acces to the data not given):
let make x =
let cell = ref x in
let get () = !cell
and set x=(cell:=x)in get, set
let () =
let get, set = make 3 in set (get() + 1)
Question: if you were to explain this program by hand (using only closed functions), how would you do it?
- Pass one more argument to
get
andset
so that they know how to find the data in the heap.
(* [...] *)
let get env () = !(env.cell)
and set env x = (env.cell := x) in
let env = {cell = cell} in
(get, env), (set, env)
let () =
let (cget, eget), (cset, eset) = make 3 in
cset eset (cget eget () + 1)
In OCamL, you could have
type (’a, ’b) closure = { code: ’a; cell: ’b }
and
let make x =
let cell = ref x in
let get (env, ()) = !(env.cell)
and set (env, x) = (env.cell := x) in
{ code = get; cell = cell }, { code = set; cell = cell }
let () =
let get, set = make 3 in
set.code (set, get.code (get, ()) + 1)
Here, get
and set
are closed functions.
let get (env, ()) = !(env.cell)
let set (env, x) = (env.cell := x) let make x =
let cell = ref x in
{ code = get; cell = cell }, { code = set; cell = cell }
let () =
let get, set = make 3 in
set.code (set, get.code (get, ()) + 1)
It combines code and data: in { code = get; cell = cell }
:
- the first field contains a pointer to a (closed) function
- the second one contains a pointer to a piece of data allocated on the heap
Other example, we want to transform the non-closed function f
(that appears in map
as well):
let rec map f xs = match xs with
| [] -> []
| x :: xs ->
f x :: map f xs
let scale k xs =
map (fun x -> k * x) xs
f x
will be turned intof.code (f, x)
(get the code pointer, and passing the closure itself and the argument)fun x -> k*x
will be turned into{code = fun (env,x) -> env.k * x; k = k}
Definition and proof (of closure conversion)
\[⟦x⟧ = x\\ ⟦t_1 t_2⟧ = ⟦t_1⟧.\texttt{code} \, (⟦t_1⟧, ⟦t_2⟧)\\\]Application is not efficient this: we don’t want to duplicate $⟦t_1⟧$ (it would even be incorrect: if you had side-effects, they would be run twice):
\[⟦t_1 t_2⟧ = \texttt{let } clo = ⟦t_1⟧ \texttt{ in } clo.\texttt{code} \, (clo, ⟦t_2⟧)\\\]Now, what about $λ$-abstractions?
If $\lbrace x_1, …, x_n\rbrace \, ≝ \, fv(λx.t)$:
\[⟦λx.t⟧ = \texttt{let } code = λ(clo, x). \texttt{ let } x_i = π_i \, clo \texttt{ in } ⟦t⟧\\ \texttt{ in } (code, x_1, …, x_n)\\\]Soundness of closure conversion
Which semantics to use for the source calculus?
- small-step, substitution-based?
- big-step, substitution-based?
- big-step, environment-based?
- interpreter, with fuel, environment-based?
As we have environments in closure, big-step, environment-based is a no-brainer.
The target language should be simpler: after the translation, every $λ$-abstraction is closed! The semantics for this target language can be simplified as well, as every function is closed (Metal Semantics (closer to the machine), denoted by $↓↓_{cbv}$).
\[e⊢ t↓_{cbv} c ⟹ ⟦e⟧ ⊢ ⟦t⟧ ↓↓_{cbv} ⟦c⟧\]If the target programs could be non-deterministic, then we would have to check backward preservation: « the behaviors of the source program form a superset of the behaviors of the transformed program. ».
Recursive functions
\[t \; ≝ \; x \; \mid \; \underbrace{μf.λx.t}_{f ∈ fv(t)} \; \mid \; tt\]Applications don’t change as we don’t want to change how functions are called. As for $λ$-abstractions:
If $\lbrace f, x_1, …, x_n\rbrace \, ≝ \, fv(λx.t)$:
\[⟦μf.λx.t⟧ = \texttt{let } code = λ(clo, x). \texttt{ let } f = clo \texttt{ in let } x_i = π_i \, clo \texttt{ in } ⟦t⟧\\ \texttt{ in } (code, x_1, …, x_n)\\\]Understanding programs through closure conversion
Trick 1: difference lists
type tree =
| Leaf of int
| Node of tree * tree
Suppose you to retrieve the labels of all the leaves of the tree (fringe):
let rec fringe (t : tree) : int list = match t with
| Leaf i -> [ i ]
| Node (t1, t2) -> fringe t1 @ fringe t2
Nice looking piece of code, but very inefficient: quadratic complexity, because of @
Remedy: use difference lists:
type ’a diff =
’a list -> ’a list
let singleton (x : ’a) : ’a diff =
fun xs -> x :: xs
let concat (xs : ’a diff) (ys : ’a diff) : ’a diff =
fun zs -> xs (ys zs)
concat
becomes a constant time operation, as it’s function composition, which amounts to allocating a closure
And then, fringe:
let rec fringe_ (t : tree) : int diff =
match t with
| Leaf i -> singleton i
| Node (t1, t2) -> concat (fringe_ t1) (fringe_ t2)
let fringe t = fringe_ t []
Is it really more efficient? Complexity: $O(n)+O(n) = O(n)$
If we want to know what OCamL does, we can mentally closure-convert it.
Existential type for closure in OCamL:
type (’a, ’b) closure =
| Clo:
(’a * ’e -> ’b) (* A (closed) function... *)
* ’e (* ...and its environment... *)
-> (’a, ’b) closure (* ...together form a closure. *)
Invoking closures:
let apply (f : (’a, ’b) closure) (x : ’a) : ’b =
let Clo (code, env) = f in
code (x, env)
Now, closure conversion:
type ’a diff =
(’a list, ’a list) closure
let singleton_code =
fun (xs, x) -> x :: xs
let singleton (x : ’a) : ’a diff =
Clo (singleton_code, x)
let concat_code =
fun (zs, (xs, ys)) -> apply xs (apply ys zs)
let concat (xs : ’a diff) (ys : ’a diff) : ’a diff =
Clo (concat_code, (xs, ys))
let rec fringe_ (t : tree) : int diff =
match t with
| Leaf i -> singleton i
| Node (t1, t2) -> concat (fringe_ t1) (fringe_ t2)
let fringe t =
apply (fringe_ t) []
fringe_
copies the tree into a tree made up of closures.
But not so smart: linear time, but copying the tree at first seems to be useless (the constant factor lets a lot to be desired).
Beware of mutable local variables in your programming language: you can have unexpected behaviors with closures: e.g. in javascript
var messages = ["Wow!", "Hi!", "Closures are fun!"];
for (var i = 0; i < messages.length; i++) {
setTimeout(function () { say(messages[i]);
}, i * 1500);
}
yields three undefined
.
Defunctionalization
Used in some compilers, like MLton (ML compiler).
If $\lbrace x_1, …, x_n \rbrace = fv(λx.t)$
\[⟦x⟧ = x\\ ⟦λ^Cx.t⟧ = C(x_1, …, x_n)\\ ⟦t_1 t_2⟧ = \texttt{ apply } (⟦t_1⟧, ⟦t_2⟧)\]
Leave a comment