Formal Molecular Biology: Rule-based modelling

Teacher: Jean Krivine (IRIF)

Distributed Systems:

  • y-axis: Evolved vs. Specified
  • x-axis: Natural vs. Synthetic

Sequential ≃ totally ordered Distribution ≃ partial order

  Natural Synthetic
Specified Synthetic biology (e.g. bacteria engineering) Multicore program
Evolved Cells Web

Computer Science (CS) is usually about Specified/Synthetic

Usually in CS: you want to prove that an implementation $P$ is correct wrt its specification $S$ ⟹ Bissimulation: $P ≃ S$

But in evolved systems, we don’t have access to specification

⟶ First question: is there something to understand (i.e. some structure/modules)?

Answer: we hope so, but it could very well be that a cell is a bunch of “spaghetti-like” unorganized data

Experiments done:

  • Uri Allon: take some boolean formula $ψ$. E.g., you know that \(ψ = φ_0 ∧ φ_1\)

    Genetic programming ⟶ consider gates (OR, AND, NOT, …) and plug them randomly. Goal: you want to measure how close the input/output link is to the specification given by $ψ$. Fitness measure: $0$ = horribly bad, $1$ = perfect

    In the first generation, you generate $n$ circuits $C_1, …, C_n$, then you grade them, and the higher the grade, the higher the more likely it is to pass to the next generation (once a circuit reach fitness $1$, it is sure to be passed to future generations).

    Do that for a bunch of generations, until a circuit reaches a fitness close to $1$: examine the structure of this circuit. There’s no reason whatsoever that a subformula of $ψ$ appear in the circuit.

    Now, suppose you oscillate between satisfying $ψ$, and satisfying $ψ’ ≝ φ_0 ∨ φ_1$. After you changing back and forth the formula to satisfy, the maximum fitness is reached faster, and you see $φ_0$ and $φ_1$ appear as subgates eventually. Why? You’re teaching the circuit to be “plastic”, on top of your expected goal.

    Takeaway message: it’s a dangerous route to try to understand the cell without keeping evolution in mind.

Systems Biology vs Molecular Biology

Systems Biology:

the biology of distributed systems: biology of cellular functions (understanding functions of the cell)

In Specified/Synthetic: there’s also hardware.

Moelcular Biology:

the “hardware” part of biology: biology of mechanisms/facts/interactions (understanding what happens inside the cell)

Biology reminders: cf pictures

Amino-acid residues make up domains (they’re the basic blocks of domains, and they can be modified by post-transcriptional modification).

Naming conventions in molecular biology are reminiscent of early alchemy, they’re based on the assumed “function” of the protein, which is very bad from a systems biology point of vue (function is derived afterwards).

Language matters

Biology papers are written in English + data & curves. But it’s a very “mechanistic” English

For instance: « EGF ligand binds to EGFR receptors which in turn are able to homodimerize »

  • $≃ 10^6$ biology papers/year
  • $3000$ papers/year about EGF only

cf. DARPA “Big Mechanism”

Biology lacks an executable language. Some workaround:

  • ODE (Ordinary Differential Equations), brought about by physicists

    Example:

    \[A+B ⟶ AB\\ AB ⟶ A+B\\ AB ⟶ AB^\ast\\ AB^\ast ⟶ A+B^\ast\] \[\frac{dx_A}{dt} = +[x_A + x_{AB^\ast}] - [x_B]\]

    But: combinatorial explosion! $AB$ is comprised of $A$ and $B$ components, but you treat it as an extra variable


Kappa

A graph rewriting formalism

Terminology:

  • Labelled graph
  • Simple graph: at most one edge between two nodes
  • Labelled site graph
  • Simple/conflict-free labelled site graph: at most one edge between two site

Site are thought of as resources

  • Nodes: correspond to proteins
  • Sites: correspond to interaction capacities
  • Edges: contact (non-covalent bounds)

Sites can have labels to represent modifications

Graph embedding $g \hookrightarrow h$:
  • injective on nodes
  • name preserving
  • edge preserving

A pattern $P$ has a match $f$ in $G$ if

\[f: P \hookrightarrow G\]

Names are equipped with a signature $Σ: 𝒩 ⟶ ℕ$ that defines how many sites it has. In particular: \(Σ(⊥) = 1\)

We write $[P]_G$ as the set of matches of $P$ in $G$:

\[[P]_G ≝ \lbrace f: P \hookrightarrow G \rbrace\]

Leave a comment