ansaurus

Question

How to match pattrns "begins with A or ends with B" with regular expression?

Answer 1

+2 A:

(^A.*B$)|(^A.*$)|(^.*B$)

Boris Pavlović 2010-10-18 07:19:31

yes this matches but don't know which group in the replace part?

BOYPT 2010-10-18 07:40:16

You can leave away the grouping of the alternatives, thanks to the precedence of the "|" operator, but would want to capture the relevant substring. This leaves you with ``^A(\S*)B$|^A(\S*)$|(\S*)B$``. The ugly thing here is that now you get the desired substring in one of three match groups, and you don't know which in advance. So you might want to use the 'max(match.groups())' approach of mykhal.

ThomasH 2010-10-18 09:48:16

Answer 2

+1 A:

try this:

/(^A|B$)/

動靜能量 2010-10-18 07:20:54

While this matches at the right strings, it fails to capture the relevant substring, which is the actual difficulty here.

ThomasH 2010-10-18 09:12:28

Answer 3

+2 A:

^A|B$ or ^A|.*B$ (depending whether the match function is matching from the beginning)

UPDATE

it's difficult to write single regexp for this..

a possibility is:

match = re.match(r'^(?:A(\S+))|(?:(\S+)B)$', string)
if match:
    capture = max(match.groups())
# because match.groups() is either (capture, None) or (None, capture)

mykhal 2010-10-18 07:21:03

actually what I need is to get the (\S+) group patten, `^A|.*B$` match "`A`" "`anythingB`", but I need "`anything`"

BOYPT 2010-10-18 07:28:08

Answer 4

+2 A:

Is this the desired behavior?

var rx = /^((?:A)?)(.*?)((?:B)?)$/;
"Aanything".match(rx)
> ["Aanything", "A", "anything", ""]
"anythingB".match(rx)
> ["anythingB", "", "anything", "B"]
"AanythingB".match(rx)
> ["AanythingB", "A", "anything", "B"]
"anything".match(rx)
> ["anything", "", "anything", ""]
"AanythingB".replace(rx, '$1nothing$3');
> "AnothingB"
"AanythingB".replace(rx, '$2');
> "anything"

Fordi 2010-10-18 07:49:50

This regex misses the "not neither" requirement of the OP.

ThomasH 2010-10-18 09:07:39

Answer 5

+1 A:

BOYPT 2010-10-18 07:55:40

wouldn't be `if re.match(r'^A\S+$', s): s = s[1:]`; `if re.match(r'^\S+B$', s): s = s[:-1]` much simpler?

mykhal 2010-10-18 08:07:16

your final answer has unbalanced parenthesis :)

mykhal 2010-10-18 08:11:46

Shouldn't the condition be on `A`? Oh, it is on the samples, but not in the "final answer"...

Kobi 2010-10-18 10:27:00

oh, i forgot to update the final answer line, done now.

BOYPT 2010-10-23 13:48:01

Answer 6

A:

If you don't mind the extra weight in the case where both prefix "A" and suffix "B" exist, you can use a shorter regex:

reMatcher= re.compile(r"(?<=\AA).*|.*(?=B\Z)")

(using \A for ^ and \Z for $)

This one keeps the "A" prefix (instead of the "B" prefix of your solution) when both "A" and "B" are at their respective corners:

'A text here' matches ' text here'
'more text hereB' matches 'more text here'
'AYES!B' matched 'AYES!'
'neither' doesn't match

Otherwise, a non-regex solution (some would say a more “Pythonic” one) is:

def strip_prefix_suffix(text, prefix, suffix):
    left =  len(prefix) if text.startswith(prefix) else 0
    right= -len(suffix) if text.endswith(suffix) else None
    return text[left:right] if left or right else None

If there is no match, the function returns None to differentiate from a possible '' (e.g. when called as strip_prefix_suffix('AB', 'A', 'B')).

PS I should also say that this regex:

(?<=\AA).*(?=B\Z)|(?<=\AA).*|.*(?=B\Z)

should work, but it doesn't; it works just like the one I suggested, and I can't understand why. Breaking down the regex into parts, we can see something weird:

>>> text= 'AYES!B'
>>> re.compile('(?<=\\AA).*(?=B\\Z)').search(text).group(0)
'YES!'
>>> re.compile('(?<=\\AA).*').search(text).group(0)
'YES!B'
>>> re.compile('.*(?=B\\Z)').search(text).group(0)
'AYES!'
>>> re.compile('(?<=\\AA).*(?=B\\Z)|(?<=\\AA).*').search(text).group(0)
'YES!'
>>> re.compile('(?<=\\AA).*(?=B\\Z)|.*(?=B\\Z)').search(text).group(0)
'AYES!'
>>> re.compile('(?<=\\AA).*|.*(?=B\\Z)').search(text).group(0)
'AYES!'
>>> re.compile('(?<=\\AA).*(?=B\\Z)|(?<=\\AA).*|.*(?=B\\Z)').search(text).group(0)
'AYES!'

For some strange reason, the .*(?=B\\Z) subexpression takes precedence, even though it's the last alternative.

ΤΖΩΤΖΙΟΥ 2010-10-18 21:12:50

I've opened an [issue](http://bugs.python.org/issue10139) in the Python bug tracker, since it's a possible bug in the re engine.

ΤΖΩΤΖΙΟΥ 2010-10-19 01:06:56

ansaurus

tags:

views:

answers:

How to match pattrns "begins with A or ends with B" with regular expression?

UPDATE

related questions