views:

238

answers:

2

I'm trying to find the correct Regular Expression to match all RT scenarios on Twitter (can't wait to Twitter's new retweet API). The way I see it, RT's can be at the beginning, middle, or end of the string returned from Twitter. So, I need something at the beginning and end of this Regular Expression:

([Rr])([Tt])

No matter what I try, I cannot match all scenarios in one Regular Expression.
I tried

[^|\s+]

to match the scenario where the RT will appear either at the beginning of the string or after one or more whitespace characters, but it didn't work the same for the end of the string or RT.
I tried

[\s+|$]

to match a case when the RT appear either in the end of the string or there's one or more whitespace characters following it, same as with the 'pre' -- it didn't work.

Can someone please explain what am I doing wrong here? Any help or suggestions will be highly appreciated (as always :) )

+6  A: 

You'll probably be happiest with something like:

/\brt\b/i

Which will find isolated instances of RT (that is, surrounded by word-boundaries), and use the /i modifier at the end of the regex to make it case-insensitive.

You want the word boundaries so that you don't end up thinking random tweets containing words like "Art" and "Quartz" are actually retweets. Even then, it's going to have false positives.

By default, a regular expression can (and will) match anywhere inside a string, so you don't need to account for what may precede or follow your match if indeed you don't care what it is or if it is present.

Adam Bellaire
+2  A: 
if(preg_match('/\brt\s*@(\w+)/i', $tweet, $match))
    echo 'Somebody retweeted ' . $match[1] . "\n";
chaos
looks like Adam answer is the elegant solution i was looking for.Thanks
Yaniv
Yes, Adam's is the solution you're after but it doesn't hurt to account for the @ sign as well. You might overlap with someone giving directions by posting "turn rt". Watching for @ and/or anchoring at the beginning of the string will help mitigate this.
Nerdling