tags:

views:

43

answers:

1

I have a string that I want to parse using regex. It has the follow format:

"random text [id value] more text [id value] other stuff"

I would like to find the pattern that will match [id value] the brackets included. Do I have to do anything special to return two matches instead of one match. My concern is that I will only return this match:

"[id value] more text [id value]"

using a pattern like this: \[((.|\n)*?)\]

I'm not very good at regex, so this may seem super trival. Does the regex parser move from left to right through the string?

I then have a follow up question that is a little more complicated. What if I have nested patterns:

"random text [id [id value]] more text [id value] other stuff"

I'd like to be able to capture [id [id value]] as a match and [id value] as a match. Is it possible to write one regex that will return both [id [id value]] and [id value]. In this situation I'd like to check for balanced brackets and then end on the last match, is that possible using regex?

+1  A: 

Matching parentheses is the canonical example of a simple task that cannot be done using finite automata - and regular expressions are simply a language for describing finite automata. Matching parentheses requires a context-free grammar.

Of course, modern regex libraries have capabilities that go well beyond basic regular expressions. So you might be able to manage something. I'd not hold out much luck, though.

You'd probably be able to crank out a simple recursive-descent parser in less time than it would take you to figure out how to torture a regex into something that would mostly work.

Jeff Dege
So I basically need to write a parser that can parse each nested level into a base case that regex can handle. I think I can handle that.
Shawn