ansaurus

Question

Answer 1

+2 A:

Try

(RT|retweet|from|via)(?:\b\W*@(\w+))+'

Enclosing the \b\W*@(\w+) in '(?:...)` allows you to group the terms for repetition without capturing the aggregate.

I'm not sure I'm following the second part of your question, but I think you may be looking for something involving a construct like:

(?:(?!RT|@).)

which will match any character that isn't an "@" or the start of "RT", again without capturing it.

In that case, how about:

(RT|retweet|from|via)((?:\b\W*@\w+)+)

and then post process

re.split(r'@(\w+)' ,m.groups()[1])

To get the individual handles?

MarkusQ 2009-03-17 20:35:38

thanks for the quick reply!unfortunately that doesn't seem to work, unless i've mistyped something: tweet='foobar RT@one, @two: @three barfoo' m=re.search(r'(RT|retweet|from|via)(?:\b\W*@(\w+))+',tweet) m.groups() ('RT', 'three')but i will read up on (?:...). thanks.

jhofman 2009-03-17 20:40:12

thanks markus. i essentially ended up going with a method similar to this, but was bothered by not being able to come up with a one-regex solution. appreciate it.

jhofman 2009-03-23 19:16:27

ansaurus

tags:

views:

answers:

python regular expression for retweets

related questions