Formal Molecular Biology: Rule-based modelling
Teacher: Jean Krivine (IRIF)
Distributed Systems:
- y-axis: Evolved vs. Specified
- x-axis: Natural vs. Synthetic
Sequential ≃ totally ordered Distribution ≃ partial order
Natural | Synthetic | |
---|---|---|
Specified | Synthetic biology (e.g. bacteria engineering) | Multicore program |
Evolved | Cells | Web |
Computer Science (CS) is usually about Specified/Synthetic
Usually in CS: you want to prove that an implementation $P$ is correct wrt its specification $S$ ⟹ Bissimulation: $P ≃ S$
But in evolved systems, we don’t have access to specification
⟶ First question: is there something to understand (i.e. some structure/modules)?
Answer: we hope so, but it could very well be that a cell is a bunch of “spaghetti-like” unorganized data
Experiments done:
-
Uri Allon: take some boolean formula $ψ$. E.g., you know that \(ψ = φ_0 ∧ φ_1\)
Genetic programming ⟶ consider gates (OR, AND, NOT, …) and plug them randomly. Goal: you want to measure how close the input/output link is to the specification given by $ψ$. Fitness measure: $0$ = horribly bad, $1$ = perfect
In the first generation, you generate $n$ circuits $C_1, …, C_n$, then you grade them, and the higher the grade, the higher the more likely it is to pass to the next generation (once a circuit reach fitness $1$, it is sure to be passed to future generations).
Do that for a bunch of generations, until a circuit reaches a fitness close to $1$: examine the structure of this circuit. There’s no reason whatsoever that a subformula of $ψ$ appear in the circuit.
Now, suppose you oscillate between satisfying $ψ$, and satisfying $ψ’ ≝ φ_0 ∨ φ_1$. After you changing back and forth the formula to satisfy, the maximum fitness is reached faster, and you see $φ_0$ and $φ_1$ appear as subgates eventually. Why? You’re teaching the circuit to be “plastic”, on top of your expected goal.
Takeaway message: it’s a dangerous route to try to understand the cell without keeping evolution in mind.
Systems Biology vs Molecular Biology
- Systems Biology:
-
the biology of distributed systems: biology of cellular functions (understanding functions of the cell)
In Specified/Synthetic: there’s also hardware.
- Moelcular Biology:
-
the “hardware” part of biology: biology of mechanisms/facts/interactions (understanding what happens inside the cell)
Biology reminders: cf pictures
Amino-acid residues make up domains (they’re the basic blocks of domains, and they can be modified by post-transcriptional modification).
Naming conventions in molecular biology are reminiscent of early alchemy, they’re based on the assumed “function” of the protein, which is very bad from a systems biology point of vue (function is derived afterwards).
Language matters
Biology papers are written in English + data & curves. But it’s a very “mechanistic” English
For instance: « EGF ligand binds to EGFR receptors which in turn are able to homodimerize »
- $≃ 10^6$ biology papers/year
- $3000$ papers/year about EGF only
cf. DARPA “Big Mechanism”
Biology lacks an executable language. Some workaround:
-
ODE (Ordinary Differential Equations), brought about by physicists
Example:
\[A+B ⟶ AB\\ AB ⟶ A+B\\ AB ⟶ AB^\ast\\ AB^\ast ⟶ A+B^\ast\] \[\frac{dx_A}{dt} = +[x_A + x_{AB^\ast}] - [x_B]\]But: combinatorial explosion! $AB$ is comprised of $A$ and $B$ components, but you treat it as an extra variable
Kappa
A graph rewriting formalism
Terminology:
- Labelled graph
- Simple graph: at most one edge between two nodes
- Labelled site graph
- Simple/conflict-free labelled site graph: at most one edge between two site
Site are thought of as resources
- Nodes: correspond to proteins
- Sites: correspond to interaction capacities
- Edges: contact (non-covalent bounds)
Sites can have labels to represent modifications
- Graph embedding $g \hookrightarrow h$:
-
- injective on nodes
- name preserving
- edge preserving
A pattern $P$ has a match $f$ in $G$ if
\[f: P \hookrightarrow G\]Names are equipped with a signature $Σ: 𝒩 ⟶ ℕ$ that defines how many sites it has. In particular: \(Σ(⊥) = 1\)
We write $[P]_G$ as the set of matches of $P$ in $G$:
\[[P]_G ≝ \lbrace f: P \hookrightarrow G \rbrace\]
Leave a comment