views:

248

answers:

4

Good day,

I need to extract portion of string which can looks like this:

"some_text MarkerA some_text_to_extract MarkerB some_text"
"some_text MarkerA some_text_to_extract"

I need to extract some_text_to_extract in both cases. MarkerA, MarkerB - predefined text strings.

I tried this regexps, but with no luck:

".*\sMarkerA(.*)MarkerB.*" - does not work in case 2
".*\sMarkerA(.*)(?=MarkerB)?.*" - wrong result "some_text_to_extract MarkerB some_text" 
".*\sMarkerA(.*)(?:MarkerB)?.*" - does not work at all

Could you please help me with this issue?

A: 

Try:

".*\sMarkerA(.*?)(?=$|MarkerB)"

Test code:

#!/usr/bin/env python

tests = [
        ("some_text MarkerA some_text_to_extract MarkerB some_text"," some_text_to_extract "),
        ("some_text MarkerA some_text_to_extract"," some_text_to_extract")
        ]

import re
reg = re.compile(r".*\sMarkerA(.*?)(?=$|MarkerB)")

for (input,expected) in tests:
    mo = reg.match(input)
    assert mo is not None
    print mo.group(1),expected
    assert mo.group(1) == expected
Douglas Leeder
Thank you Douglas! This was exactly what I need = )
+2  A: 

First, get rid of the .* at the beginning and the end; you don't need to match the whole string. Then use alternation to match either the ending delimiter or the end of the string.

"MarkerA(.*?)(?:MarkerB|$)"
Alan Moore
A: 

The

".*\sMarkerA(.*)"

Part of the regex will match everything after MarkerA not giving a change for MarkerB or anything else to match to match. The .* is being greedy, you can use the non greedy form of * , *? to give:

".*\sMarkerA(.*)(?=MarkerB)?.*"

You probably want not to capture the space before MarkerB so in that case use:

".*\sMarkerA(.*)(?=\sMarkerB)?.*"
Callum
A: 

Thanks to all who replyed to my question! This was REALY fast = )) Great site!