# Introduction

$\newcommand{\dom}{\mathop{\rm dom}\nolimits}$

## Regular word languages

  graph {
rankdir=LR;
"finite word automata" -- "regular word expressions", "MSO on words", "finite monoids";
}

• regular expressions: denotational

• finite automata: operational

• MSO: logic

• finite moinoids: algebraic, reason about the infixes of the language

• $\begin{cases} φ: Σ^\ast ⟶ M \\ ε \mapsto 1 \\ u v w ⟼ φ(u)φ(v)φ(w) \end{cases}$
• aperiodic monoids ⇔ first order logic

  graph {
rankdir=LR;
"finite tree automata" -- "regular tree expressions", "MSO on trees",  "finite algebra";
}


regular tree expressions, finite algebra: not that easy to manipulate…

# Definitions

## Tree

A ranked alphabet:

is a pair $⟨𝔉, arity⟩$ where $arity: 𝔉 ⟶ ℕ$

notations:

• $f^{(n)}$ for $f ∈ 𝔉$ with arity $n$
• $𝔉_n ≝ \lbrace f ∈ 𝔉 \mid arity(f) = n\rbrace$, $𝔉 ≝ \bigcup_{n∈ℕ} 𝔉_n$

### Rooted, ordered, labelled, finite trees

as partial functions:

let $𝔉$ be a ranked alphabet. A partial function $t: ℕ_{>0}^\ast ⟶ 𝔉$ is a tree iff

• ${\rm dom} (t)$ is finite an non-empty

• ${\rm dom} (t)$ is prefixed-closed: $∀p, p' ∈ ℕ_{>0}^\ast, \qquad pp' ∈ {\rm dom} (t) ⟹ p ∈ {\rm dom} (t)$

• labels are consistent with $𝔉$: $∀p ∈ {\rm dom} (t), \, t(p) ∈ 𝔉_n$ for some $n$ implies $\lbrace pi ∈ {\rm dom} (t) \mid i∈ ℕ_{>0}^\ast \rbrace = \lbrace p1, \ldots, pn \rbrace$

  graph {
rankdir=TB;
"f^(2) | ε" -- "g^(1) | 1",  "g^(1) | 2";
"g^(1) | 1" -- "a^(0) | 11";
"g^(1) | 2" -- "b^(0) | 21";
}


notation: $T(𝔉)$ is the set of trees labelled by $𝔉$

the subtree of $t$ at $p ∈ {\rm dom} (t)$ is:

$t_{|p}$, defined by $\begin{cases} {\rm dom} (t_{|p}) ≝ \lbrace p' \mid pp' ∈ {\rm dom} (t) \rbrace \\ t_{|p}(p') ≝ t(pp') \end{cases}$

A non-deterministic finite tree automaton (NFTA) is:

a tuple $𝒜 ≝ ⟨Q, 𝔉, Q_f, Δ⟩$ where

• $Q$ is a finite set of states
• $𝔉$ is a finite ranked alphabet
• $Q_f ⊆ Q$ is a set of final states
• $Δ ⊆ \bigcup_n Q × 𝔉_n × Q^n$
A run of $𝒜$ on a tree $t$ is:

a tree in $T(Q × ℕ)$ where $arity(q, n) ≝ n$ such that

• $\dom ρ = \dom t$
• $∀p ∈ \dom ρ = \dom t, \text{ if } ρ(p) = (q, n) \text{ for some } q, n \text{ then } ∃ (q, t(p), q_1, \ldots, q_n)∈ Δ \text{ s.t. } t(p) ∈ 𝔉_n$
• $∀ 1 ≤ i ≤ n, ρ(pi) = (q_i, n_i) \text{ for some } n_i$

A run $ρ$ is accepting if $ρ(ε) ∈ Q_f$

The language of $𝒜$ is:
$L_𝒜 ≝ \lbrace t ∈ T(𝔉) \mid ∃ \text{ accepting run on } t \rbrace$

Example:

• $Q ≝ \lbrace q_f, q_g, q_a, q_b \rbrace$

• $Q_f ≝ \lbrace q_f \rbrace$

• $𝔉 ≝ \lbrace f^{(2)}, g^{(1)}, a^{(0)}, b^{(0)}\rbrace$

• $Δ ≝ \lbrace (q_f, f^{(2)}, q_g, q_g), \\ (q_g, g^{(1)}, q_g), \\ (q_g, g^{(1)}, q_a), \\ (q_g, g^{(1)}, q_b), \\ (q_a, a^{(0)}), \\ (q_b, b^{(0)}) \rbrace$
  graph {
rankdir=TB;
q_g1[label= "g^(1) | q_g"];
q_g2[label= "g^(1) | q_g"];
"f^(2) | q_f" -- q_g1, q_g2;
q_g1 -- "a^(0) | q_a";
q_g2 -- "b^(0) | q_b";
}

  graph {
rankdir=TB;
g1[label= "g"];
g2[label= "g"];
g3[label= "g"];
g4[label= "g"];
b1[label= "⋮"];
b2[label= "⋮"];
f -- g1, g2;
g1 -- b1 -- g3;
g2 -- b2 -- g4;
g3 -- "a or b";
g4 -- "a || b";
}


Example:

$𝔉 ≝ \lbrace ∨, ∧, ¬, \top, \bot \rbrace$ (with obvious arities)

$𝒜$ s.t. $L(𝒜)$ is the set of Boolean formulae that evaluate to true

• $Q ≝ \lbrace q_0, q_1 \rbrace$, $Q_f ≝ \lbrace q_1 \rbrace$

• $Δ ≝ \lbrace (q_1, \top), \\ (q_0, \bot), \\ (q_1, ¬, q_0), \\ (q_0, ¬, q_1), \\ (q_1, ∧, q_1, q_1), (q_0, ∧, q_1, q_0), \\ (q_1, ∧, q_1, q_1), (q_0, ∧, q_0, q_0), \\ (q_1, ∧, q_0, q_1), (q_0, ∧, q_1, q_0), \\ \vdots \\ \rbrace$

## Bottom-up

Inductive definition:

let $𝔉$ be a ranked alphabet s.t.

• $a^{(0)} ∈ 𝔉_0$ is a tree
• if $t_1, \ldots, t_n$ are trees and $f^{(n)} ∈ 𝔉_n$, then $f^{(n)}(t_1, \ldots, t_n)$ is a tree

Let $𝒳$ be a countable set of variables $𝒳 ∩ 𝔉 = ∅$.

We set $arity(x) = 0, ∀ x∈𝒳$.

A tree in $T(𝔉 ∪ 𝒳)$ is called a term and we rather write $T(𝔉, 𝒳)$ in that case.

A substitution:

is a function $σ : 𝒳 ⟶ T(𝔉, 𝒳)$ with $\lbrace x∈𝒳 \mid σ(x) ≠ x\rbrace$ finite

It defines a function $T(𝔉, 𝒳) ⟶ T(𝔉, 𝒳)$ by congruence:

• $x σ ≝ σ(x)$
• $f^{(n)}(t_1, \ldots, t_n)σ ≝ f^{(n)}(t_1σ, \ldots, t_nσ)$

thus $a^{(0)} σ = a^{(0)}$

Example:

if $σ : \begin{cases} x ⟼ a^{(0)} \\ y ⟼ g(x) \\ \end{cases}$

$t$:

  graph {
rankdir=TB;
g1[label= "g"];
f -- g1, y;
g1 -- x;
}


$tσ$:

  graph {
rankdir=TB;
g1[label= "g"];
g2[label= "g"];
f -- g1, g2;
g1 -- a;
g2 -- x;
}

A term $t∈ T(𝔉, 𝒳)$ is linear:

if every variable of $𝒳$ appears at most once in $t$

A context:

is a term in $T(𝔉, \lbrace \square \rbrace)$ where $\square$ occurs exactly once.

notation :

• $tσ$ is an instance of $t$.
• a term in $T(𝔉, 𝒳)$ with no variables is a ground term.

Example:

$t$:

  graph {
rankdir=TB;
g1[label= "g"];
g2[label= "g"];
f -- g1, g2;
g1 -- a;
g2 -- b;
}


$t = C[t’] ≝ C σ$ where $C σ$ for $σ(\square) = g(b)$ and

$C ≝$

  graph {
rankdir=TB;
g2[label= "g"];
f -- □, g2;
g2 -- a;
}


and

$t’ ≝$

  graph {
rankdir=TB;
g -- b;
}


notations: $C(𝔉)$ for the set of contexts over $𝔉$

A term rewriting system $R$ over $T(𝔉, 𝒳)$:

is a set of pairs $l ⟶ r$ with $l, r ∈ T(𝔉, 𝒳)$ and $vars(r) ⊆ vars(l)$, where $vars(t) ≝ \lbrace x ∈ 𝒳 \mid ∃ p ∈ \dom t; \; t(p) = x \rbrace$

$R$ defines a rewriting relation $⟶_R ⊆ T(𝔉) × T(𝔉)$ by:

• $t ⟶_R t’$
• iff $∃ C ∈ C(𝔉)$ and a substitution $σ$ s.t. $t ≝ C[lσ]$ and $t’ ≝ C[rσ]$

Example:

$g(x) ⟶ g(g(x))$

$t$:

  graph {
rankdir=TB;
g1[label= "g"];
g2[label= "g"];
f -- g1, g2;
g1 -- a;
g2 -- b;
}


1) $C = f(\square, g(b))$, $σ= x ⟼ a$

$t ⟶_R$

  graph {
rankdir=TB;
g1[label= "g"];
g2[label= "g"];
g3[label= "g"];
f -- g1, g2;
g1 -- g3;
g2 -- b;
g3 -- a;
}


2) $C = f(g(a), \square)$, $σ= x ⟼ b$

$t ⟶_R$

  graph {
rankdir=TB;
g1[label= "g"];
g2[label= "g"];
g3[label= "g"];
f -- g1, g2;
g1 -- a;
g2 -- g3;
g3 -- b;
}


Example 2: It doesn’t work there:

$f(x, x) ⟶ x$

$t$:

  graph {
rankdir=TB;
g1[label= "g"];
g2[label= "g"];
f -- g1, g2;
g1 -- a;
g2 -- b;
}


$⟶_R$

  graph {
rankdir=TB;
g -- a;
}


where $C = \square$, $σ : x ⟼ g(a)$

## Bottom-up tree automata

Given a NFTA $𝒜 ≝ ⟨Q, 𝔉, Q_f, Δ⟩$, we define the top-down rewrite rules by:

• $f^{(n)}(q_1, \ldots, q_n) ⟶_𝒜 q$ for all $(q, f^{(n)}, q_1, \ldots, q_n) ∈ Δ$ and using $arity(q) = 0$ for all $q∈Q$

Proposition: $L(𝒜) ≝ \lbrace t ∈ T(𝔉) \mid ∃ q∈Q_f; t⟶_𝒜^\ast q \rbrace$

Example: when it comes to the previous example related to logical formulas:

$\top ⟶ q_1 \\ ¬ ⟶ q_0 \\ ∨(q_0, q_1) ⟶ q_1 \\ \vdots$

Other view:

$arity(q) ≝ 1$ for all $q∈ Q$

and

$f^{(n)}(q_1(x_1), \ldots, q_n(x_n)) ⟶_𝒜 q(f^{(n)}(x_1, \ldots, x_n))$

in that view:

$L(𝒜) ≝ \lbrace t ∈ T(𝔉) \mid ∃q ∈ Q_f; \; t ⟶_𝒜^\ast q(t) \rbrace$
a NFTA is complete:

if $∀n, ∀f∈ 𝔉_n, ∀q_1, \ldots, q_n ∈ Q, \; ∃ q∈Q$ s.t. $(q, f^{(n)}, q_1, \ldots, q_n) ∈ Δ$

NB: If $𝒜$ is complete, for all $t$, there exists a $q$ s.t. $t ⟶_𝒜^\ast q$

a NFTA is deterministic:

if $∀n, ∀f∈ 𝔉_n, ∀q_1, \ldots, q_n ∈ Q, \; ∃ q∈Q$ s.t. $\vert \lbrace q∈ Q \mid (q, f^{(n)}, q_1, \ldots, q_n)∈ Δ \rbrace \vert ≤ 1$

NB: If $𝒜$ is deterministic, for all $t$, there exists at most one $q$ s.t. $t ⟶_𝒜^\ast q$

NB: If $𝒜$ is complete and deterministic, for all $t$, there exists a unique $q$ s.t. $t ⟶_𝒜^\ast q$

# Closure properties

As for word automata, one can complete and determinize any NFTA.

Proposition: Given a NTFA $𝒜$, we can construct an equivalent complete NTFA $𝒜’$.

$Q' ≝ Q \sqcup \lbrace sink \rbrace$

We send any missing transition to the sink:

$Δ' ≝ Δ ∪ \lbrace (sink, f^{(n)}, q_1, \ldots, q_n) \mid f∈ 𝔉_n, \; q_1, \ldots, q_n ∈ Q'\rbrace$

Proposition: Given a NTFA $𝒜$, we can construct an equivalent complete and deterministic TFA $𝒜’$.

$t ⟶_{𝒜'}^\ast E ⟺ E ≝ \lbrace q ∈ Q \mid t ⟶_𝒜^\ast q\rbrace$
• $Q’ ≝ 2^Q$
• $Q_f’ ≝ \lbrace E ⊆ Q \mid E ∩ Q_f ≠ ∅ \rbrace$
• $Δ' ≝ \Big\lbrace (E, f^{(n)}, E_1, \ldots, E_n) \mid E_1, \ldots, E_n ⊆ Q \text{ and } \\ E ≝ \lbrace q ∈ Q \mid ∃(q_1, \ldots, q_n) ∈ E_1 × ⋯ × E_n; \; (q, f^{(n)}, q_1, \ldots, q_n) ∈ Δ \rbrace \Big\rbrace$

## Closure properties

Boolean closure quite similar to the word case:

• Union is disjoint union of tree automata

• Complementation: determinize, then complement $Q_f$

• Intersection: product automaton

## Top-down tree automata

Given a NFTA $𝒜 ≝ ⟨Q, 𝔉, Q_f, Δ⟩$, we define the bottom-up rewrite rules by:

• $q ⟶_𝒜 f^{(n)}(q_1, \ldots, q_n)$ for all $(q, f^{(n)}, q_1, \ldots, q_n) ∈ Δ$ and using $arity(q) = 0$ for all $q∈Q$

Proposition: $L(𝒜) ≝ \lbrace t ∈ T(𝔉) \mid ∃ q∈Q_f; q ⟶_𝒜^\ast t \rbrace$

Example:

If the language is $\lbrace f(a, b), f(b, a) \rbrace$

then

$Δ ≝ \lbrace (q_f, f, q_1, q_2), (q_f, f, q_2, q_1), (q_1, a), (q_2, b) \rbrace$

(chap. 1, 3, 8)

Tags:

Updated: