# Formal Molecular Biology: Rule-based modelling

Teacher: Jean Krivine (IRIF)

Distributed Systems:

• y-axis: Evolved vs. Specified
• x-axis: Natural vs. Synthetic

Sequential ≃ totally ordered Distribution ≃ partial order

Natural Synthetic
Specified Synthetic biology (e.g. bacteria engineering) Multicore program
Evolved Cells Web

Computer Science (CS) is usually about Specified/Synthetic

Usually in CS: you want to prove that an implementation $P$ is correct wrt its specification $S$ ⟹ Bissimulation: $P ≃ S$

⟶ First question: is there something to understand (i.e. some structure/modules)?

Answer: we hope so, but it could very well be that a cell is a bunch of “spaghetti-like” unorganized data

Experiments done:

• Uri Allon: take some boolean formula $ψ$. E.g., you know that $ψ = φ_0 ∧ φ_1$

Genetic programming ⟶ consider gates (OR, AND, NOT, …) and plug them randomly. Goal: you want to measure how close the input/output link is to the specification given by $ψ$. Fitness measure: $0$ = horribly bad, $1$ = perfect

In the first generation, you generate $n$ circuits $C_1, …, C_n$, then you grade them, and the higher the grade, the higher the more likely it is to pass to the next generation (once a circuit reach fitness $1$, it is sure to be passed to future generations).

Do that for a bunch of generations, until a circuit reaches a fitness close to $1$: examine the structure of this circuit. There’s no reason whatsoever that a subformula of $ψ$ appear in the circuit.

Now, suppose you oscillate between satisfying $ψ$, and satisfying $ψ’ ≝ φ_0 ∨ φ_1$. After you changing back and forth the formula to satisfy, the maximum fitness is reached faster, and you see $φ_0$ and $φ_1$ appear as subgates eventually. Why? You’re teaching the circuit to be “plastic”, on top of your expected goal.

Takeaway message: it’s a dangerous route to try to understand the cell without keeping evolution in mind.

## Systems Biology vs Molecular Biology

Systems Biology:

the biology of distributed systems: biology of cellular functions (understanding functions of the cell)

In Specified/Synthetic: there’s also hardware.

Moelcular Biology:

the “hardware” part of biology: biology of mechanisms/facts/interactions (understanding what happens inside the cell)

Biology reminders: cf pictures

Amino-acid residues make up domains (they’re the basic blocks of domains, and they can be modified by post-transcriptional modification).

Naming conventions in molecular biology are reminiscent of early alchemy, they’re based on the assumed “function” of the protein, which is very bad from a systems biology point of vue (function is derived afterwards).

### Language matters

Biology papers are written in English + data & curves. But it’s a very “mechanistic” English

For instance: « EGF ligand binds to EGFR receptors which in turn are able to homodimerize »

• $≃ 10^6$ biology papers/year
• $3000$ papers/year about EGF only

cf. DARPA “Big Mechanism”

Biology lacks an executable language. Some workaround:

• ODE (Ordinary Differential Equations), brought about by physicists

Example:

$A+B ⟶ AB\\ AB ⟶ A+B\\ AB ⟶ AB^\ast\\ AB^\ast ⟶ A+B^\ast$ $\frac{dx_A}{dt} = +[x_A + x_{AB^\ast}] - [x_B]$

But: combinatorial explosion! $AB$ is comprised of $A$ and $B$ components, but you treat it as an extra variable

## Kappa

A graph rewriting formalism

Terminology:

• Labelled graph
• Simple graph: at most one edge between two nodes
• Labelled site graph
• Simple/conflict-free labelled site graph: at most one edge between two site

Site are thought of as resources

• Nodes: correspond to proteins
• Sites: correspond to interaction capacities
• Edges: contact (non-covalent bounds)

Sites can have labels to represent modifications

Graph embedding $g \hookrightarrow h$:
• injective on nodes
• name preserving
• edge preserving

A pattern $P$ has a match $f$ in $G$ if

$f: P \hookrightarrow G$

Names are equipped with a signature $Σ: 𝒩 ⟶ ℕ$ that defines how many sites it has. In particular: $Σ(⊥) = 1$

We write $[P]_G$ as the set of matches of $P$ in $G$:

$[P]_G ≝ \lbrace f: P \hookrightarrow G \rbrace$

Tags:

Updated: