views:

288

answers:

9

Hello All,

I would like to use regular expression to extract only @patrick @michelle from the following sentence:

@patrick  @michelle we having diner @home tonight do you want to join?

Note: @home should not be include in the result because, it is not at beginning of the sentence nor is followed by another @name.

Any solution, tip, comments will be really appreciated.

A: 

Try this regular expression:

/^\s*@(\w+)\s+@(\w+)/

\s denotes whitespace characters and \w word characters.

Gumbo
@Gumbo: I think Joey may be after something a bit more robust... your regexp wouldn't match if there was only a single name
Zaid
A: 

As long as it starts with an @ and continues with those this will do it I tested it in poweshell so some regex engines are a bit different. This should also catch n names at the beginning of the line

"^((@\w+)\s)+"

rerun
You'd need to add a trailing space to the string before using that RE, though, to handle the case where the string ends with `@foo`.
Robert Rossney
yes if the string ends with an at. the \s shuld be \s*
rerun
A: 

Perhaps something like this, though you'll have to split on whitespace anything in the matching group to extract multiple ids.

/^\s*(@\w+\s+)*\s+.*$/
tvanfosson
+4  A: 
/(?:(?:@\S+\s+)+|^)@\S+/g

It first matches either an "@" followed by many non-space characters, or the start of line, and then matches another "@" followed by many non-space characters.

Note that it's common in Twitter that @name is preceded by RT, appears in the middle or end of the tweet e.g. http://twitter.com/ceetee/statuses/9874073403. Basically you can't distinguish whether a @name is really a name just using RegEx or even a parser. The best bet is to check if http://twitter.com/name 404 or not.

KennyTM
This seems to works well but only for 2. How can extend it to match n @name at the beginning of sentence.Input: @patrick @michelle @john @Ted we having diner @home tonight do you want to join?
Joey
@Joey: See update.
KennyTM
Thanks Kenny, this is exactly what I wantImplementation in pythonimport remsg = 'comes here're.findall('(?:(?:@\S+\s+)+|^)@\S+', msg)
Joey
Trouble is, testing "@home" *doesn't* return 404, and yet it's also not a Twitter account name.
Rob Kennedy
A: 

You have tagged your post c#, so I assume you can use the .NET Regex imnplementation. Using .NET, the following Regex will do:

(?<![^@]\w+\s+)(@\w+)

This will match any words starting with @, that do not have a word without @ before them. Note that "dinner @home @8pm" will still break it, though.

See here for more details.

Jens
+1  A: 

Well, at first I thought this failed because I looked at the groups that are returned:

>>> tw = re.compile(r"^((@\w*)\s+)*")
>>> tw.findall(tweet)
[('@michelle ', '@michelle')]
>>> tw.match(tweet).groups()
('@michelle ', '@michelle')

Note that the groups only keep the last value for any group in the re. But if you just grab group(), then you get the whole matched string:

>>> tw.match(tweet).group()
'@patrick  @michelle '

For grins, I'll try pyparsing:

>>> from pyparsing import Word, printables, OneOrMore
>>> atName = Word("@",printables)
>>> OneOrMore(atName).parseString(tweet).asList()
['@patrick', '@michelle']
Paul McGuire
A: 

for PHP

/^\s*@(\w+)\s+@(\w+)/

Thanks KennyM

in python

msg = '@patrick  @michelle we having diner @home tonight do you want to join?'
import re
re.findall('(?:(?:@\S+\s+)+|^)@\S+', msg)

This works with 1 or n @name at the beginning of the sentence.

Thank you all for the quick replies.

Joey
A: 

In Perl, you can exploit the /g match-more-than-once modifier combined with the \G zero-width where-we-left-off assertion and list context, thus:

my $str = '@patrick  @michelle we having diner @home tonight do you want to join?';
my @matches = ($str =~ m/\G(\@\w+)\s*/g);

print join(', ', @matches) . "\n";

This should be robust across any number of initial @-strings.

darch
A: 

For Python check out: http://github.com/BonsaiDen/AtarashiiFormat
It will also give you the links and the tags.

And beware of using a simple regex, you will end up with a big mess, as I did before I converted the Twitter Text Java Library.

Ivo Wetzel