ansaurus

Question

How do I extract text between two different matches?

Answer 1

+2 A:

why not just:

with open(fname, 'w') as file:
    for match in re.finditer(r'Item A(.+?)Item B', subject, re.I):
        s = match.group(1)
        if len(s) > 50:
            file.write(s)

Note: using actual numerical values of flags is rather oblique, use provided in re flags.

SilentGhost 2010-06-22 17:35:32

You should use a look-ahead assertion for the end delimiter to allow overlapping of start and end delimiters.

Gumbo 2010-06-22 17:46:18

Thanks! Once I figured out what all this meant I got it to work.

dandyjuan 2010-06-22 18:25:16

Answer 2

+2 A:

This can be done in a single regex:

with open("output.txt", "w") as f:
    for match in re.finditer(r"(?<=Item\sA)(?:(?!Item\sB).){50,}(?=Item\sB)", subject, re.I):
        f.write(match.group()+"\n")

This matches what is between Item A and Item B. Or did you want to match the delimiters, too?

The regex explained:

(?<=Item\sA)   # assert that we start our match right after "Item A"
(?:            # start repeated group (non-capturing)
  (?!Item\sB)  # assert that we're not running into "Item B"
  .            # then match any character
){50,}         # repeat this at least 50 times
(?=Item\sB)    # then assert that "Item B" follows next (without making it part of the match)

Tim Pietzcker 2010-06-22 17:36:22

This is great code, but it's kind of complex and hard to figure out.

vy32 2010-06-22 17:40:34

@vy32: I agree, and I have provided a free-spacing version of the regex to explain it better.

Tim Pietzcker 2010-06-22 17:45:27

ansaurus

tags:

views:

answers:

How do I extract text between two different matches?

related questions