tags:

views:

424

answers:

3

Hi. I have an text that consists of information enclosed by a certain pattern. The only thing I know is the pattern: "${template.start}" and ${template.end} To keep it simple I will substitute ${template.start} and ${template.end} with "a" in the example.

So one entry in the text would be:

aINFORMATIONHEREa

I do not know how many of these entries are concatenated in the text. So the following is correct too:

aFOOOOOOaaASDADaaASDSDADa

I want to write a regular expression to extract the information enclosed by the "a"s.

My first attempt was to do:

a(.*)a

which works as long as there is only one entry in the text. As soon as there are more than one entries it failes, because of the .* matching everything. So using a(.*)a on aFOOOOOOaaASDADaaASDSDADa results in only one capturing group containing everything between the first and the last character of the text which are "a":

FOOOOOOaaASDADaaASDSDAD

What I want to get is something like

captureGroup(0):  aFOOOOOOaaASDADaaASDSDADa
captureGroup(1): FOOOOOO
captureGroup(2): ASDAD
captureGroup(3): ASDSDAD

It would be great to being able to extract each entry out of the text and from each entry the information that is enclosed between the "a"s. By the way I am using the QRegExp class of Qt4.

Any hints? Thanks! Markus


Multiple variation of this question have been seen before. Various related discussions:

and probably others...

+5  A: 

Simply use non-greedy expressions, namely:

a(.*?)a
cletus
That was the right hint for me! Thanks!In Qt you have to use QRegExp::setMinimal(true); to achieve the same I just found out.
.*? is clearer as long as your regex language supports it.
PEZ
+3  A: 

You need to match something like:

a[^a]*a
Fernando Miguélez
A: 

You have a couple of working answers already, but I'll add a little gratuitous advice:

Using regular expressions for parsing is a road fraught with danger

Edit: To be less cryptic: for all there power, flexibility and elegance, regular expression are not sufficiently expressive to describe any but the simplest grammars. Ther are adequate for the problem asked here, but are not a suitable replacement for state machine or recursive decent parsers if the input language become more complicated.

SO, choosing to use RE for parsing input streams is a decision that should be made with care and with an eye towards the future.

dmckee