Lecture 1: Introduction

Introduction

\[\newcommand{\dom}{\mathop{\rm dom}\nolimits}\]

Regular word languages

  graph {
    rankdir=LR;
    "finite word automata" -- "regular word expressions", "MSO on words", "finite monoids";
  }
  • regular expressions: denotational

  • finite automata: operational

  • MSO: logic

  • finite moinoids: algebraic, reason about the infixes of the language

    • \[\begin{cases} φ: Σ^\ast ⟶ M \\ ε \mapsto 1 \\ u v w ⟼ φ(u)φ(v)φ(w) \end{cases}\]
    • aperiodic monoids ⇔ first order logic

  graph {
    rankdir=LR;
    "finite tree automata" -- "regular tree expressions", "MSO on trees",  "finite algebra";
  }

regular tree expressions, finite algebra: not that easy to manipulate…

Definitions

Tree

A ranked alphabet:

is a pair $⟨𝔉, arity⟩$ where $arity: 𝔉 ⟶ ℕ$

notations:

  • $f^{(n)}$ for $f ∈ 𝔉$ with arity $n$
  • $𝔉_n ≝ \lbrace f ∈ 𝔉 \mid arity(f) = n\rbrace$, \(𝔉 ≝ \bigcup_{n∈ℕ} 𝔉_n\)

Rooted, ordered, labelled, finite trees

as partial functions:

let $𝔉$ be a ranked alphabet. A partial function $t: ℕ_{>0}^\ast ⟶ 𝔉$ is a tree iff

  • ${\rm dom} (t)$ is finite an non-empty

  • ${\rm dom} (t)$ is prefixed-closed: \(∀p, p' ∈ ℕ_{>0}^\ast, \qquad pp' ∈ {\rm dom} (t) ⟹ p ∈ {\rm dom} (t)\)

  • labels are consistent with $𝔉$: $∀p ∈ {\rm dom} (t), \, t(p) ∈ 𝔉_n$ for some $n$ implies \(\lbrace pi ∈ {\rm dom} (t) \mid i∈ ℕ_{>0}^\ast \rbrace = \lbrace p1, \ldots, pn \rbrace\)

  graph {
    rankdir=TB;
    "f^(2) | ε" -- "g^(1) | 1",  "g^(1) | 2";
    "g^(1) | 1" -- "a^(0) | 11";
    "g^(1) | 2" -- "b^(0) | 21";
  }

notation: $T(𝔉)$ is the set of trees labelled by $𝔉$

the subtree of $t$ at $p ∈ {\rm dom} (t)$ is:

$t_{|p}$, defined by \(\begin{cases} {\rm dom} (t_{|p}) ≝ \lbrace p' \mid pp' ∈ {\rm dom} (t) \rbrace \\ t_{|p}(p') ≝ t(pp') \end{cases}\)


A non-deterministic finite tree automaton (NFTA) is:

a tuple $𝒜 ≝ ⟨Q, 𝔉, Q_f, Δ⟩$ where

  • $Q$ is a finite set of states
  • $𝔉$ is a finite ranked alphabet
  • $Q_f ⊆ Q$ is a set of final states
  • $Δ ⊆ \bigcup_n Q × 𝔉_n × Q^n$
A run of $𝒜$ on a tree $t$ is:

a tree in $T(Q × ℕ)$ where $arity(q, n) ≝ n$ such that

  • $\dom ρ = \dom t$
  • \[∀p ∈ \dom ρ = \dom t, \text{ if } ρ(p) = (q, n) \text{ for some } q, n \text{ then } ∃ (q, t(p), q_1, \ldots, q_n)∈ Δ \text{ s.t. } t(p) ∈ 𝔉_n\]
  • \[∀ 1 ≤ i ≤ n, ρ(pi) = (q_i, n_i) \text{ for some } n_i\]

A run $ρ$ is accepting if $ρ(ε) ∈ Q_f$

The language of $𝒜$ is:
\[L_𝒜 ≝ \lbrace t ∈ T(𝔉) \mid ∃ \text{ accepting run on } t \rbrace\]

Example:

  • $Q ≝ \lbrace q_f, q_g, q_a, q_b \rbrace$

  • $Q_f ≝ \lbrace q_f \rbrace$

  • $𝔉 ≝ \lbrace f^{(2)}, g^{(1)}, a^{(0)}, b^{(0)}\rbrace$

  • \[Δ ≝ \lbrace (q_f, f^{(2)}, q_g, q_g), \\ (q_g, g^{(1)}, q_g), \\ (q_g, g^{(1)}, q_a), \\ (q_g, g^{(1)}, q_b), \\ (q_a, a^{(0)}), \\ (q_b, b^{(0)}) \rbrace\]
  graph {
    rankdir=TB;
    q_g1[label= "g^(1) | q_g"];
    q_g2[label= "g^(1) | q_g"];
    "f^(2) | q_f" -- q_g1, q_g2;
    q_g1 -- "a^(0) | q_a";
    q_g2 -- "b^(0) | q_b";
  }
  graph {
    rankdir=TB;
    g1[label= "g"];
    g2[label= "g"];
    g3[label= "g"];
    g4[label= "g"];
    b1[label= "⋮"];
    b2[label= "⋮"];
    f -- g1, g2;
    g1 -- b1 -- g3;
    g2 -- b2 -- g4;
    g3 -- "a or b";
    g4 -- "a || b";
  }

Example:

\(𝔉 ≝ \lbrace ∨, ∧, ¬, \top, \bot \rbrace\) (with obvious arities)

$𝒜$ s.t. $L(𝒜)$ is the set of Boolean formulae that evaluate to true

  • $Q ≝ \lbrace q_0, q_1 \rbrace$, $Q_f ≝ \lbrace q_1 \rbrace$

  • \[Δ ≝ \lbrace (q_1, \top), \\ (q_0, \bot), \\ (q_1, ¬, q_0), \\ (q_0, ¬, q_1), \\ (q_1, ∧, q_1, q_1), (q_0, ∧, q_1, q_0), \\ (q_1, ∧, q_1, q_1), (q_0, ∧, q_0, q_0), \\ (q_1, ∧, q_0, q_1), (q_0, ∧, q_1, q_0), \\ \vdots \\ \rbrace\]

Bottom-up

Inductive definition:

let $𝔉$ be a ranked alphabet s.t.

  • $a^{(0)} ∈ 𝔉_0$ is a tree
  • if $t_1, \ldots, t_n$ are trees and $f^{(n)} ∈ 𝔉_n$, then $f^{(n)}(t_1, \ldots, t_n)$ is a tree

Let $𝒳$ be a countable set of variables $𝒳 ∩ 𝔉 = ∅$.

We set $arity(x) = 0, ∀ x∈𝒳$.

A tree in $T(𝔉 ∪ 𝒳)$ is called a term and we rather write $T(𝔉, 𝒳)$ in that case.

A substitution:

is a function $σ : 𝒳 ⟶ T(𝔉, 𝒳)$ with $\lbrace x∈𝒳 \mid σ(x) ≠ x\rbrace$ finite

It defines a function $T(𝔉, 𝒳) ⟶ T(𝔉, 𝒳)$ by congruence:

  • $x σ ≝ σ(x)$
  • $f^{(n)}(t_1, \ldots, t_n)σ ≝ f^{(n)}(t_1σ, \ldots, t_nσ)$

thus $a^{(0)} σ = a^{(0)}$

Example:

if \(σ : \begin{cases} x ⟼ a^{(0)} \\ y ⟼ g(x) \\ \end{cases}\)

$t$:

  graph {
    rankdir=TB;
    g1[label= "g"];
    f -- g1, y;
    g1 -- x;
  }

$tσ$:

  graph {
    rankdir=TB;
    g1[label= "g"];
    g2[label= "g"];
    f -- g1, g2;
    g1 -- a;
    g2 -- x;
  }
A term $t∈ T(𝔉, 𝒳)$ is linear:

if every variable of $𝒳$ appears at most once in $t$

A context:

is a term in $T(𝔉, \lbrace \square \rbrace)$ where $\square$ occurs exactly once.

notation :

  • $tσ$ is an instance of $t$.
  • a term in $T(𝔉, 𝒳)$ with no variables is a ground term.

Example:

$t$:

  graph {
    rankdir=TB;
    g1[label= "g"];
    g2[label= "g"];
    f -- g1, g2;
    g1 -- a;
    g2 -- b;
  }

$t = C[t’] ≝ C σ$ where $C σ$ for $σ(\square) = g(b)$ and

$C ≝ $

  graph {
    rankdir=TB;
    g2[label= "g"];
    f -- □, g2;
    g2 -- a;
  }

and

$t’ ≝ $

  graph {
    rankdir=TB;
    g -- b;
  }

notations: $C(𝔉)$ for the set of contexts over $𝔉$


A term rewriting system $R$ over $T(𝔉, 𝒳)$:

is a set of pairs $l ⟶ r$ with $l, r ∈ T(𝔉, 𝒳)$ and $vars(r) ⊆ vars(l)$, where \(vars(t) ≝ \lbrace x ∈ 𝒳 \mid ∃ p ∈ \dom t; \; t(p) = x \rbrace\)

$R$ defines a rewriting relation $⟶_R ⊆ T(𝔉) × T(𝔉)$ by:

  • $t ⟶_R t’$
  • iff $∃ C ∈ C(𝔉)$ and a substitution $σ$ s.t. $t ≝ C[lσ]$ and $t’ ≝ C[rσ]$

Example:

\[g(x) ⟶ g(g(x))\]

$t$:

  graph {
    rankdir=TB;
    g1[label= "g"];
    g2[label= "g"];
    f -- g1, g2;
    g1 -- a;
    g2 -- b;
  }

1) $C = f(\square, g(b))$, $σ= x ⟼ a$

$t ⟶_R$

  graph {
    rankdir=TB;
    g1[label= "g"];
    g2[label= "g"];
    g3[label= "g"];
    f -- g1, g2;
    g1 -- g3;
    g2 -- b;
    g3 -- a;
  }

2) $C = f(g(a), \square)$, $σ= x ⟼ b$

$t ⟶_R$

  graph {
    rankdir=TB;
    g1[label= "g"];
    g2[label= "g"];
    g3[label= "g"];
    f -- g1, g2;
    g1 -- a;
    g2 -- g3;
    g3 -- b;
  }

Example 2: It doesn’t work there:

\[f(x, x) ⟶ x\]

$t$:

  graph {
    rankdir=TB;
    g1[label= "g"];
    g2[label= "g"];
    f -- g1, g2;
    g1 -- a;
    g2 -- b;
  }

$⟶_R$

  graph {
    rankdir=TB;
    g -- a;
  }

where $C = \square$, $σ : x ⟼ g(a)$

Bottom-up tree automata

Given a NFTA $𝒜 ≝ ⟨Q, 𝔉, Q_f, Δ⟩$, we define the top-down rewrite rules by:

  • $f^{(n)}(q_1, \ldots, q_n) ⟶_𝒜 q$ for all $(q, f^{(n)}, q_1, \ldots, q_n) ∈ Δ$ and using $arity(q) = 0$ for all $q∈Q$

Proposition: \(L(𝒜) ≝ \lbrace t ∈ T(𝔉) \mid ∃ q∈Q_f; t⟶_𝒜^\ast q \rbrace\)

Example: when it comes to the previous example related to logical formulas:

\[\top ⟶ q_1 \\ ¬ ⟶ q_0 \\ ∨(q_0, q_1) ⟶ q_1 \\ \vdots\]

Other view:

$arity(q) ≝ 1$ for all $q∈ Q$

and

\[f^{(n)}(q_1(x_1), \ldots, q_n(x_n)) ⟶_𝒜 q(f^{(n)}(x_1, \ldots, x_n))\]

in that view:

\[L(𝒜) ≝ \lbrace t ∈ T(𝔉) \mid ∃q ∈ Q_f; \; t ⟶_𝒜^\ast q(t) \rbrace\]
a NFTA is complete:

if $∀n, ∀f∈ 𝔉_n, ∀q_1, \ldots, q_n ∈ Q, \; ∃ q∈Q$ s.t. \((q, f^{(n)}, q_1, \ldots, q_n) ∈ Δ\)

NB: If $𝒜$ is complete, for all $t$, there exists a $q$ s.t. $t ⟶_𝒜^\ast q$

a NFTA is deterministic:

if $∀n, ∀f∈ 𝔉_n, ∀q_1, \ldots, q_n ∈ Q, \; ∃ q∈Q$ s.t. \(\vert \lbrace q∈ Q \mid (q, f^{(n)}, q_1, \ldots, q_n)∈ Δ \rbrace \vert ≤ 1\)

NB: If $𝒜$ is deterministic, for all $t$, there exists at most one $q$ s.t. $t ⟶_𝒜^\ast q$

NB: If $𝒜$ is complete and deterministic, for all $t$, there exists a unique $q$ s.t. $t ⟶_𝒜^\ast q$


Closure properties

As for word automata, one can complete and determinize any NFTA.

Proposition: Given a NTFA $𝒜$, we can construct an equivalent complete NTFA $𝒜’$.

\[Q' ≝ Q \sqcup \lbrace sink \rbrace\]

We send any missing transition to the sink:

\[Δ' ≝ Δ ∪ \lbrace (sink, f^{(n)}, q_1, \ldots, q_n) \mid f∈ 𝔉_n, \; q_1, \ldots, q_n ∈ Q'\rbrace\]

Proposition: Given a NTFA $𝒜$, we can construct an equivalent complete and deterministic TFA $𝒜’$.

\[t ⟶_{𝒜'}^\ast E ⟺ E ≝ \lbrace q ∈ Q \mid t ⟶_𝒜^\ast q\rbrace\]
  • $Q’ ≝ 2^Q$
  • $Q_f’ ≝ \lbrace E ⊆ Q \mid E ∩ Q_f ≠ ∅ \rbrace$
  • \[Δ' ≝ \Big\lbrace (E, f^{(n)}, E_1, \ldots, E_n) \mid E_1, \ldots, E_n ⊆ Q \text{ and } \\ E ≝ \lbrace q ∈ Q \mid ∃(q_1, \ldots, q_n) ∈ E_1 × ⋯ × E_n; \; (q, f^{(n)}, q_1, \ldots, q_n) ∈ Δ \rbrace \Big\rbrace\]

Closure properties

Boolean closure quite similar to the word case:

  • Union is disjoint union of tree automata

  • Complementation: determinize, then complement $Q_f$

  • Intersection: product automaton

Top-down tree automata

Given a NFTA $𝒜 ≝ ⟨Q, 𝔉, Q_f, Δ⟩$, we define the bottom-up rewrite rules by:

  • $q ⟶_𝒜 f^{(n)}(q_1, \ldots, q_n)$ for all $(q, f^{(n)}, q_1, \ldots, q_n) ∈ Δ$ and using $arity(q) = 0$ for all $q∈Q$

Proposition: \(L(𝒜) ≝ \lbrace t ∈ T(𝔉) \mid ∃ q∈Q_f; q ⟶_𝒜^\ast t \rbrace\)


Example:

If the language is \(\lbrace f(a, b), f(b, a) \rbrace\)

then

\[Δ ≝ \lbrace (q_f, f, q_1, q_2), (q_f, f, q_2, q_1), (q_1, a), (q_2, b) \rbrace\]

(chap. 1, 3, 8)

Leave a comment