L'objectif de cet exercice est de donner une sémantique à des expressions régulières simples comportant :
On spécifiera le langage reconnu par une expression régulière et on démontrera un certain nombre de propriétés.
RegExprExercise.thy
(:brush=isabelle:)
theory RegExprExercise
imports Main
begin
text ‹
A theory of simple regular expressions on an alphabet 'a.
We handle:
▪ the empty regular expression ε, which matches nothing
▪ the atomic regular expression, which matches its atom
▪ the alternative between two regular expressions, which matches either one or the other
▪ the sequence of regular expressions, which matches a sequence of matches of two REs
▪ the Kleene closure of a regular expression, which matches any sequence of matches of the RE
›
datatype 'a regexpr =
Empty (‹ε›)
| Atom 'a (‹<_>›)
| Alt ‹'a regexpr› ‹'a regexpr› (infixr ‹.|.› 65)
| Seq ‹'a regexpr› ‹'a regexpr› (infixr ‹.+.› 75)
| Star ‹'a regexpr› (‹_*›)
text ‹
Some examples on characters
›
term ‹<CHR a>›
term ‹<CHR a> .|. <CHR b>›
term ‹<CHR a>*›
term ‹(<CHR a> .|. <CHR b>)*›
term ‹(<CHR a> .|. <CHR b>)* .+. <CHR c>›
text ‹
Languages are sets of words, a word being an 'a list.
›
type_synonym 'a language = ‹'a list set›
text ‹
The Kleene closure of a language is the smallest set that contains:
▪ the empty word
▪ all concatenations of words of the language.
inductive_set is like inductive, but directly defines a set
›
inductive_set Kleene :: ‹'a language ⇒ 'a language› (‹_⋆›)
for L :: ‹'a language›
where
― ‹The empty word belongs to the Kleene closure›
Empty: ‹[] ∈ Kleene L›
― ‹Which other words belong to the closure? Think about 'append', the @ operator on lists›
| Cons: ‹⟦w ∈ L ⟧ ⟹ w ∈ Kleene L› ― ‹FIXME: this sets L⋆ = [] ∪ L›
text ‹
Language recognized by a regular expression
›
fun language :: ‹'a regexpr ⇒ 'a language›
where
Lang_Empty: ‹language ε = undefined›
| Lang_Atom: ‹language <a> = undefined›
| Lang_Alt: ‹language (r1 .|. r2) = undefined›
| Lang_Seq: ‹language (r1 .+. r2) = undefined›
| Lang_Star: ‹language (r*) = undefined›
section ‹Some properties›
text ‹What is the language of the empty regexpr?›
lemma epsilon_empty: ‹language ε = X›
sorry
text ‹And what about its Kleene closure?›
lemma ‹language (ε*) = X›
sorry
text ‹The alternative commutes›
lemma ‹A .|. B = B .|. A› ― ‹FIXME: I am wrong›
sorry
text ‹ε is left and right absorbant for the sequence›
lemma ‹ε .+. A = ε› sorry
lemma ‹A .+. ε = ε› sorry
― ‹FIXME: these lemmas are wrong, write the correct form and prove them›
text ‹ε is neutral for the alternative›
lemma ‹ε .|. A = A› sorry
lemma ‹A .|. ε = A› sorry
― ‹FIXME: these lemmas are wrong, write the correct form and prove them›
text ‹The language of a regexpr is included in its Kleene closure›
lemma kleene_includes: ‹something about r ⊆ someting about r*› ― ‹FIXME: I am wrong›
sorry
text ‹Correspondance between Kleene and Star›
lemma morph_kleene: ‹something about * = something about ⋆›
by simp
text ‹The Kleene closure is stable by concatenation›
lemma kleene_closed: ‹⟦w ∈ (L⋆); w' ∈ (L⋆)⟧ ⟹ w@w' ∈ (L⋆)›
sorry
text ‹Dual version for regexpr›
corollary kleene_closed': ‹True› ..
text ‹
Some checks on concrete strings and regexprs.
›
schematic_goal a_or_b_then_c_star:
‹language (((<CHR a> .|. <CHR b>) .+. <CHR c>)*) = ?X›
by simp
thm a_or_b_then_c_star
lemma ‹acbc ∈ language (((<CHR a> .|. <CHR b>) .+. <CHR c>)*)› (is ‹?w ∈ language (?R*)›)
proof -
have ‹ac ∈ language (?R)› by simp
hence 1:‹ac ∈ language (?R*)› using kleene_includes by blast
have ‹bc ∈ language (?R)› by simp
hence 2:‹bc ∈ language (?R*)› using kleene_includes by blast
have ‹?w = ac@bc› by simp
with kleene_closed'[OF 1 2] show ?thesis by presburger
qed
end
(:endbrush:)