i'm working on a regex that will extract retweet keywords and user names from tweets. here's an example, with a rather terrible regex to do the job:
tweet='foobar RT@one, @two: @three barfoo'
m=re.search(r'(RT|retweet|from|via)\b\W*@(\w+)\b\W*@(\w+)\b\W*@(\w+)\b\W*',tweet)
m.groups()
('RT', 'one', 'two', 'three')
what i'd like is to condense the repeated \b\W*@(\w+)\b\W*
patterns and make them of a variable number, so that if @four were added after @three, it would also be extracted. i've tried many permutations to repeat this with a +
unsuccessfully.
i'd also like this to work for something like
tweet='foobar RT@one, RT @two: RT @three barfoo';
which can be achieved with a re.finditer if the patterns don't overlap. (i have a version where the patterns do overlap, and so only the first RT gets picked up.)
any help is greatly appreciated. thanks.