tags:

views:

499

answers:

7

I'm detecting @replies in a Twitter stream with the following PHP code using regexes.

$text = preg_replace('!^@([A-Za-z0-9_]+)!', '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);
$text = preg_replace('! @([A-Za-z0-9_]+)!', ' <a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

How can I best combine these two rules without false flagging [email protected] as a reply?

A: 

Here's how I'd do the combination

$text = preg_replace('!(^| )@([A-Za-z0-9_]+)!', '$1<a href="http://twitter.com/$2" target="_blank">@$2</a>', $text);
Peter Bailey
Read the post, please. "How can I best combine these two rules without false flagging [email protected] as a reply?"
ceejayoz
Man, I'm sorry - I TOTALLY missed that part of the post - I didn't mean to waste a reply.
Peter Bailey
+4  A: 

OK, on a second thought, not flagging whatever@email means that the previous element has to be a "non-word" item, because any other element that could be contained in a word could be signaled as an email, so it would lead:

!(^|\W)@([A-Za-z0-9_]+)!

but then you have to use $2 instead of $1.

Diego Sevilla
That works nicely, thanks!
ceejayoz
[A-Za-z0-9_] == \w
Mez
also, this will eat the whitespace before the @
hop
This will work for most email addresses, but technically can break, since mailbox names can contain a much wider array of characters than a perl word can (which is a-z, 0-9, underscore)
Peter Bailey
@BaileyP: The idea was to avoid e-mail addresses.@hop: You're right, so the substituted string should include also the $1 in front of the link.
Diego Sevilla
@diegosevilla - I know. That's my point. The set of allowable characters for email-address mailboxes names is far greater than what \w includes. In other words, it WILL break for something like [email protected] (which is valid) since \W will match the period.
Peter Bailey
I'm okay with the occasional breakage, I think.
ceejayoz
Full line of PHP code, with a small change so it doesn’t eat any whitespace before the @: `$str = preg_replace('!(^|\W)@([\w]+)!', ' <a href="http://twitter.com/$2" rel="nofollow">@$2</a>', $str);`
Mathias Bynens
A: 

I think you can use alternation,: so look for the beginning of a string or a space

'!(?:^|\s)@([A-Za-z0-9_]+)!'
meouw
You need to group your alternation. e.g. '!(?:^|\s)@([A-Za-z0-9_]+)!'
Ben Blank
now it eats the whitespace and you can't even get to it in $1
hop
A: 
preg_replace('%(?<!\S)@([A-Za-z0-9_]+)%', '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

(?<!\S) is loosely translated to "no preceding non-whitespace character". Sort of a double-negation, but also works at the start of the string/line.

This won't consume any preceding character, won't use any capturing group, and won't match strings such as "[email protected]", which is a valid e-mail address.

Tested:

Input = 'foo bar [email protected] bee @def goo@doo @woo'
Output = 'foo bar [email protected] bee <a href="http://twitter.com/def" target="_blank">@def</a> goo@doo <a href="http://twitter.com/woo" target="_blank">@woo</a>'
MizardX
this one doesn't even have the correct syntax
hop
You need to use a different sentinel character; your bangs conflict. e.g. '#(?<!\S)@([A-Za-z0-9_]+)#'
Ben Blank
still not working. also, the double negative is not necessary
hop
A: 

Hu, guys, don't push too far... Here it is :

!^\s*@([A-Za-z0-9_]+)!
e-satis
This would not match a reply in a string like "hello, @ceejayoz!".
ceejayoz
A: 
$text = preg_replace('/(^|\W)@(\w+)/', '<a href="http://twitter.com/$2" target="_blank">@$2</a>', $text);
Mez
closest to a nice answer, but this will eat the whitespace!
hop
+2  A: 

Since the ^ does not have to stand at the beginning of the RE, you can use grouping and | to combine those REs.

If you don't want re-insert the whitespace you captured, you have to use "positive lookbehind":

$text = preg_replace('/(?<=^|\s)@(\w+)/',
    '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

or "negative lookbehind":

$text = preg_replace('/(?<!\S)@(\w+)/',
    '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

...whichever you find easier to understand.

hop