Lecture 3: Closure conversion, Defunctionalization

Teacher: François Pottier

Closure conversion

Goal: compile a language with arbitrary first-class functions (i.e., $λ$-calculus) to a language with closed first-class functions (e.g. C, etc…)

In C: closures simulated/emulated by hand (sytems programming, etc…)

  • compilation of functional programming languages
  • explains first-class functions
  • space and time cost;
  • programming technique in languages without first-class functions (e.g. C).
let iter f t =
     for i = 0 to Array.length t - 1 do f t.(i) done
let sum t =
    let s = ref 0 in
    let add x = (s := !s + x) in iter add t;
        !s

⟶ How is this program transformed by a compiler without using local variables and nested functions?

⟶ the function add could be returned as the output of a function

Procedural abstraction (in OOP): you always have to access the wanted through methods (direct acces to the data not given):

let make x =
    let cell = ref x in
    let get () = !cell
    and set x=(cell:=x)in get, set
let () =
    let get, set = make 3 in set (get() + 1)

Question: if you were to explain this program by hand (using only closed functions), how would you do it?

  1. Pass one more argument to get and set so that they know how to find the data in the heap.
(* [...] *)
let get env () = !(env.cell)
and set env x = (env.cell := x) in
let env = {cell = cell} in
    (get, env), (set, env)

let () =
    let (cget, eget), (cset, eset) = make 3 in
    cset eset (cget eget () + 1)

In OCamL, you could have

type (a, b) closure = { code: a; cell: b }

and

let make x =
    let cell = ref x in
    let get (env, ()) = !(env.cell)
    and set (env, x) = (env.cell := x) in
    { code = get; cell = cell }, { code = set; cell = cell }
let () =
    let get, set = make 3 in
    set.code (set, get.code (get, ()) + 1)

Here, get and set are closed functions.

 let get (env, ()) = !(env.cell)
 let set (env, x) = (env.cell := x) let make x =
    let cell = ref x in
    { code = get; cell = cell }, { code = set; cell = cell }
let () =
    let get, set = make 3 in
    set.code (set, get.code (get, ()) + 1)

It combines code and data: in { code = get; cell = cell }:

  • the first field contains a pointer to a (closed) function
  • the second one contains a pointer to a piece of data allocated on the heap

Other example, we want to transform the non-closed function f (that appears in map as well):

 let rec map f xs = match xs with
    | [] -> []
    | x :: xs ->
      f x :: map f xs
let scale k xs =
    map (fun x -> k * x) xs
  1. f x will be turned into f.code (f, x) (get the code pointer, and passing the closure itself and the argument)
  2. fun x -> k*x will be turned into {code = fun (env,x) -> env.k * x; k = k}

Definition and proof (of closure conversion)

⟦x⟧ = x\\ ⟦t_1 t_2⟧ = ⟦t_1⟧.\texttt{code} \, (⟦t_1⟧, ⟦t_2⟧)\\

Application is not efficient this: we don’t want to duplicate $⟦t_1⟧$ (it would even be incorrect: if you had side-effects, they would be run twice):

⟦t_1 t_2⟧ = \texttt{let } clo = ⟦t_1⟧ \texttt{ in } clo.\texttt{code} \, (clo, ⟦t_2⟧)\\

Now, what about $λ$-abstractions?

If $\lbrace x_1, …, x_n\rbrace \, ≝ \, fv(λx.t)$:

⟦λx.t⟧ = \texttt{let } code = λ(clo, x). \texttt{ let } x_i = π_i \, clo \texttt{ in } ⟦t⟧\\ \texttt{ in } (code, x_1, …, x_n)\\

Soundness of closure conversion

Which semantics to use for the source calculus?

  • small-step, substitution-based?
  • big-step, substitution-based?
  • big-step, environment-based?
  • interpreter, with fuel, environment-based?

As we have environments in closure, big-step, environment-based is a no-brainer.

The target language should be simpler: after the translation, every $λ$-abstraction is closed! The semantics for this target language can be simplified as well, as every function is closed (Metal Semantics (closer to the machine), denoted by $↓↓_{cbv}$).

e⊢ t↓_{cbv} c ⟹ ⟦e⟧ ⊢ ⟦t⟧ ↓↓_{cbv} ⟦c⟧

If the target programs could be non-deterministic, then we would have to check backward preservation: « the behaviors of the source program form a superset of the behaviors of the transformed program. ».

Recursive functions

t \; ≝ \; x \; \mid \; \underbrace{μf.λx.t}_{f ∈ fv(t)} \; \mid \; tt

Applications don’t change as we don’t want to change how functions are called. As for $λ$-abstractions:

If $\lbrace f, x_1, …, x_n\rbrace \, ≝ \, fv(λx.t)$:

⟦μf.λx.t⟧ = \texttt{let } code = λ(clo, x). \texttt{ let } f = clo \texttt{ in let } x_i = π_i \, clo \texttt{ in } ⟦t⟧\\ \texttt{ in } (code, x_1, …, x_n)\\

Understanding programs through closure conversion

Trick 1: difference lists

 type tree =
    | Leaf of int
    | Node of tree * tree

Suppose you to retrieve the labels of all the leaves of the tree (fringe):

let rec fringe (t : tree) : int list = match t with
    | Leaf i -> [ i ]
    | Node (t1, t2) -> fringe t1 @ fringe t2

Nice looking piece of code, but very inefficient: quadratic complexity, because of @

Remedy: use difference lists:

type a diff =
    a list -> a list

let singleton (x : a) : a diff =
    fun xs -> x :: xs

let concat (xs : a diff) (ys : a diff) : a diff =
    fun zs -> xs (ys zs)

concat becomes a constant time operation, as it’s function composition, which amounts to allocating a closure

And then, fringe:

let rec fringe_ (t : tree) : int diff =
    match t with
    | Leaf i -> singleton i
    | Node (t1, t2) -> concat (fringe_ t1) (fringe_ t2)

let fringe t = fringe_ t []

Is it really more efficient? Complexity: $O(n)+O(n) = O(n)$

If we want to know what OCamL does, we can mentally closure-convert it.

Existential type for closure in OCamL:

 type (a, b) closure =
    | Clo:
        (a * e -> b) (* A (closed) function... *)
        * e (* ...and its environment... *)
        -> (a, b) closure (* ...together form a closure. *)
(α, β) \; \texttt{closure} ≃ ∃ε.((α × ε) → β) × ε

Invoking closures:

 let apply (f : (a, b) closure) (x : a) : b =
    let Clo (code, env) = f in
        code (x, env)

Now, closure conversion:

type a diff =
    (a list, a list) closure

let singleton_code =
    fun (xs, x) -> x :: xs

let singleton (x : a) : a diff =
    Clo (singleton_code, x)

let concat_code =
    fun (zs, (xs, ys)) -> apply xs (apply ys zs)

let concat (xs : a diff) (ys : a diff) : a diff =
    Clo (concat_code, (xs, ys))

let rec fringe_ (t : tree) : int diff =
    match t with
        | Leaf i -> singleton i
        | Node (t1, t2) -> concat (fringe_ t1) (fringe_ t2)

let fringe t =
    apply (fringe_ t) []

fringe_ copies the tree into a tree made up of closures.

But not so smart: linear time, but copying the tree at first seems to be useless (the constant factor lets a lot to be desired).


Beware of mutable local variables in your programming language: you can have unexpected behaviors with closures: e.g. in javascript

var messages = ["Wow!", "Hi!", "Closures are fun!"];

for (var i = 0; i < messages.length; i++) {
    setTimeout(function () { say(messages[i]);
    }, i * 1500);
}

yields three undefined.

Defunctionalization

Used in some compilers, like MLton (ML compiler).

If $\lbrace x_1, …, x_n \rbrace = fv(λx.t)$

⟦x⟧ = x\\ ⟦λ^Cx.t⟧ = C(x_1, …, x_n)\\ ⟦t_1 t_2⟧ = \texttt{ apply } (⟦t_1⟧, ⟦t_2⟧)

Leave a comment