tags:

views:

225

answers:

2

Hey guys, I'm trying to select a specific string out of a text, but I'm not a master of regular expressions. I tried one way, and it starts from the string I want but it matches everything after what I want too.

My regex:

\nSCR((?s).*)(GI|SI)(.*?)\n

Text I'm matching on.

Hierbij een test

SCR
S09
/[email protected]
05FEB
GI BRGDS OPS

middle text string (may not selected)

SCR
S09
05FEB
LHR
NPVT700 PVT701 30MAR30MAR 1000000 005CRJ FAB1900 07301NCE DD
/ RE.GBFLY/
GI BRGDS

The middle string is selected, it only needs the SCR until the GI line.

+1  A: 

To match from a line starting with SCR to a line starting with GI or SI (inclusive), you would use the following regular expression:

(?m:^SCR\n(?:^(?!GI|SI).*\n)*(?:GI|SI).*)

This will:

  • Find the start of a line.
  • Match SCR and a new line.
  • Match all lines not starting with GI or SI.
  • Match the last line, requiring there to be GI or SI (this prevents it from matching to the end of the string if there is no GI or SI.
Blixt
I just changed my regex a bit, inspired by Gumbo. His regular expression took into account the fact that if a group doesn't have a `GI` or `SI` line, the regular expression shouldn't match. Now my regex and his second regex are pretty similar, except that mine uses the start of line anchor `^` instead of matching a new line.
Blixt
+2  A: 

Use the non-greedy quantifier also on the first quantifier:

\nSCR((?s).*?)(GI|SI)(.*?)\n

Or you could use a negative look-ahead assertion (?!expr) to capture just those lines that do not start with either GI or SI:

\nSCR((?:\n(?!GI|SI).*)*)\n(?:GI|SI).*\n
Gumbo