tags:

views:

129

answers:

2

2 Regex question

How can I match a word or 2 words in a subpattern ()?

How can i match a word or 2 words that's either followed by a specific word like "with" OR the end of the string $

I tried

(\w+\W*\w*\b)(\W*\bwith\b|$)

but it's definitely not working

edit: I'm thinking of matching both "go to mall" and "go to", in a way that i can group "go to" in python.

+1  A: 

Perhaps something like this?

>>> import re
>>> r = re.compile(r'(\w+(\W+\w+)?)(\W+with\b|\Z)')
>>> r.search('bar baz baf bag').group(1)
'baf bag'
>>> r.search('bar baz baf with bag').group(1)
'baz baf'
>>> r.search('bar baz baf without bag').group(1)
'without bag'
>>> r.search('bar with bag').group(1)
'bar'
>>> r.search('bar with baz baf with bag').group(1)
'bar'
Jukka Suomela
Although not exactly what i was looking for, but the \Z trick solved the problem for me. A question is what does the ? do in the first sets of ()?
ultimatebuster
(xxx)? means that the part xxx is optional. Therefore (\w+(\W+\w+)?) matches either whatever \w+\W+\w+ matches or whatever \w+ matches.
Jukka Suomela
@ultimatebuster: **\Z is not a trick** ... it is exactly what you want if you need to match the end of the line and nothing else.
John Machin
A: 

Here's what I came up with:

import re


class Bunch(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)


match = re.compile(
    flags = re.VERBOSE,
    pattern = r"""
        ( (?!with) (?P<first> [a-zA-Z_]+ ) )
        ( \s+ (?!with) (?P<second> [a-zA-Z_]+ ) )? 
        ( \s+ (?P<awith> with ) )? 
        (?![a-zA-Z_\s]+)
        | (?P<error> .* )
    """
).match

s = 'john doe with'

b = Bunch(**match(s).groupdict())

print 's:', s

if b.error:
    print 'error:', b.error
else:
    print 'first:', b.first
    print 'second:', b.second
    print 'with:', b.awith

Output:
s: john doe with
first: john
second: doe
with: with

Tried it also with:

s: john
first: john
second: None
with: None

s: john doe
first: john
second: doe
with: None

s: john with
first: john
second: None
with: with

s: john doe width
error: john doe width

s: with
error: with

BTW: re.VERBOSE and re.DEBUG are your friends.

Regards, Mick.

pillmuncher