ansaurus

Question

Can regular expressions be used to match nested patterns?

Answer 1

+4 A:

No, you are getting into the realm of Context Free Grammers at that point

Craig H 2008-09-25 14:19:02

Answer 2

+2 A:

No. You need a full-blown parser for this type of problem.

Adam Rosenfield 2008-09-25 14:19:25

... or Perl5.10 or higher

Brad Gilbert 2009-07-07 01:09:56

Answer 3

+33 A:

No. It's that easy. A finite automaton (which is the data structure underlying a regular expression) does not have memory apart from the state it's in, and if you have arbitrarily deep nesting, you need an arbitrarily large automaton, which collides with the notion of a finite automaton.

You can match nested/paired elements up to a fixed depth, where the depth is only limited by your memory, because the automaton gets very large. In practice, however, you should use a push-down automaton, i.e a parser for a context-free grammar, for instance LL (top-down) or LR (bottom-up). You have to take the worse runtime behavior into account: O(n^3) vs. O(n), with n = length(input).

There are many parser generators avialable, for instance ANTLR for Java. Finding an existing grammar for Java (or C) is also not difficult.
For more background: Automata Theory at Wikipedia

Torsten Marek 2008-09-25 14:27:12

I expected as much. Thanks for the detailed answer.

Richard Dorman 2008-09-25 15:03:08

Torsten is correct as far as theory is concerned.In practice many implementations have some trick in order to allow you to perform recursive "regular expressions". E.g. see the chapter "Recursive patterns" in http://php.net/manual/en/regexp.reference.php

daremon 2008-09-25 15:26:12

I am spoiled by my upbringing in Natural Language Processing and the automata theory it included.

Torsten Marek 2008-09-25 15:31:08

A refreshingly clear answer. Best "why not" I've ever seen.

Ben Doom 2008-09-25 16:35:52

I agree with Ben completely: excellent answer.

Adam Bernier 2009-02-20 07:22:07

Regular expressions in language theory and regular expressions in practice are different beasts... since _regular_ expressions can't have niceties such as back references, forward references etc.

Novikov 2010-10-04 16:54:57

Answer 4

+6 A:

Probably working perl solution, if the string is on one line:

my $NesteD ;
$NesteD = qr/ \{( [^{}] | (??{ $NesteD }) )* \} /x ;

if ( $Stringy =~ m/\b( \w+$NesteD )/x ) {
    print "Found: $1\n" ;
  }

HTH

EDIT: check:

http://dev.perl.org/perl6/rfc/145.html
ruby information: http://www.ruby-forum.com/topic/112084
more perl: http://www.perlmonks.org/?node_id=660316
even more perl: http://search.cpan.org/~dconway/Text-Balanced-v2.0.0/lib/Text/Balanced.pm
perl, perl, perl: http://perl.plover.com/yak/regex/samples/slide083.html

And one more thing by Torsten Marek (who had pointed out correctly, that it's not a regex anymore):

http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2008-03/msg01047.html

Zsolt Botykai 2008-09-25 14:40:25

Thanks, removed my earlier comment. Sadly, I'm out of votes for today:(

Torsten Marek 2008-09-25 15:04:21

Yup. Perl's "regular expressions" aren't (and haven't been for a very long time). It should be noted that recursive regexes are a new feature in Perl 5.10 and that even though you can do this you probably shouldn't in most of the cases that commonly come up (e.g. parsing HTML).

Michael Carman 2008-09-25 15:09:02

http://perldoc.perl.org/perlretut.html

Brad Gilbert 2008-10-16 16:30:24

Answer 5

+2 A:

The Pumping lemma for regular languages is the reason why you can't do that.

The generated automaton will have a finite number of states, say k, so a string of k+1 opening braces is bound to have a state repeated somewhere (as the automaton processes the characters). The part of the string between the same state can be duplicated infinitely many times and the automaton will not know the difference.

In particular, if it accepts k+1 opening braces followed by k+1 closing braces (which it should) it will also accept the pumped number of opening braces followed by unchanged k+1 closing brases (which it shouldn't).

Rafał Dowgird 2008-09-25 14:47:07

Answer 6

+1 A:

Proper Regular expressions would not be able to do it as you would leave the realm of Regular Languages to land in the Context Free Languages territories.

Nevertheless the "regular expression" packages that many languages offer are strictly more powerful.

For example, Lua regular expressions have the "%b()" recognizer that will match balanced parenthesis. In your case you would use "%b{}"

Another sophisticated tool similar to sed is gema, where you will match balanced curly braces very easily with {#}.

So, depending on the tools you have at your disposal your "regular expression" (in a broader sense) may be able to match nested parenthesis.

Remo.D 2008-09-25 15:09:20

Answer 7

+1 A:

as zsolt mentioned, some regex engines support recursion -- of course, these are typically the ones that use a backtracking algorithm so it won't be particularly efficient. example: /(?>[^{}]*){(?>[^{}]*)(?R)*(?>[^{}]*)}/sm

2008-09-25 15:25:10

Answer 8

+2 A:

Yes, if it is .NET RegEx-engine. .Net engine supports finite state machine supplied with an external stack. see details http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/

2008-12-05 06:35:28

As others have mentioned, .NET is _not_ the only capable regex engine to do this.

Ben S 2010-03-15 00:18:46

Answer 9

A:

This seems to work: /(\{(?:\{.*\}|[^\{])*\})/m

Sean Huber 2010-04-01 20:39:10

Answer 10

A:

Using regular expressions to check for nested patterns is very easy.

'/(\((?>[^()]+|(?1))*\))/'

MichaelRushton 2010-10-03 18:49:51

ansaurus

tags:

views:

answers:

Can regular expressions be used to match nested patterns?

related questions