views:

847

answers:

4

Does anyone know any examples of the following?

  1. Proof developments about regular expressions (possibly extended with backreferences) in proof assistants (such as Coq).
  2. Programs in dependently-typed languages (such as Agda) about regular expressions.
+4  A: 

See Perl Regular Expression Matching is NP-Hard

Regex matching is NP-hard when regexes are allowed to have backreferences.

Reduction of 3-CNF-SAT to Perl Regular Expression Matching

[...] 3-CNF-SAT is NP-complete. If there were an efficient (polynomial-time) algorithm for computing whether or not a regex matched a certain string, we could use it to quickly compute solutions to the 3-CNF-SAT problem, and, by extension, to the knapsack problem, the travelling salesman problem, etc. etc.

eed3si9n
Even more than that: regular expression pattern matching with backreferences has been proved to be NP-complete by Alfred Aho quite some time ago by reduction from the vertex-cover problem. See Theorem 6.2 from [A. V. Aho. Algorithms for finding patterns in strings. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, volume A: Algorithms and Complexity, pages 255–300. The MIT Press, 1990].
volodyako
+1  A: 

I don't know of any development that treats regular expressions by themselves.

Finite automata, however, relevant since NFAs are the standard way to match those regular expressions, have been studied in NuPRL. Have a look at : Robert L. Constable, Paul B. Jackson, Pavel Naumov, Juan Uribe. Constructively Formalizing Automata Theory.

Should you be interested in approaching those formal languages through algebra, esp. developing finite semigroup theory, there are a number of algebra libraries developed in various theorem provers that you could think of using, with one particularly efficient in a finite setting.

huitseeker
+4  A: 

Certified Programming with Dependent Types has a section on creating a verified regular expression matcher. Coq Contribs has an automata contribution that might be useful.

eaubin
Right, I have also noted this paragraph in CPDT a few weeks ago. It does hit the ball. As per the automata contribution, it certainly is. However, it uses axioms. E.g., 4 axioms in the proof that regular languages are defined by NFAs (module RatReg). A proof without axioms is also possible (but not contained in the Coq contributions).
volodyako
+5  A: 

Moreira, Pereira & de Sousa, On the Mechanisation of Kleene Algebra in Coq gives a nice verified construction of the Antimirov derivative of regexps in Coq. It's pretty easy to read off a CFA from this construction, and to compute the intersection of regexps.

I'm not sure why you separate Coq from dependently typed programming: Coq essentially is programming in a polymorphic dependently typed lambda calculus with inductive types (i.e., CIC, the calculus of inductive constructions).

I've never heard of a formalisation of regexps in a dependently typed language, nor have I heard of something such as an Antimirov-like derivative for regexps with backtracking, but Becchi & Crowley, Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions provide a notion of finite-state automata that matches a Perl-like regexp languages. That might attractive to formalisers in the near future.

Charles Stewart