views:

206

answers:

2

Related (but slightly different):

http://stackoverflow.com/questions/1216007/surrounding-all-instances-of-and-http-with-a

I would like to surround all instances of @_______, #________, and http://________ with anchor tags in one pass.

For example, consider this Twitter message:

The quick brown fox @Spreadthemovie jumps over the lazy dog #cow http://bit.ly/bC9Dy

Running it with the desired regex pattern would yield:

The quick brown fox <a href="a">@Spreadthemovie</a> jumps over the lazy
dog <a href="b">#cow</a> <a href="c">http://bit.ly/bC9Dy&lt;/a&gt;

Only surround words that start with @, # or http:// so that [email protected] would not become dog<b>@gmail.com</b>.

+1  A: 
s/(?<!\w)(@\w*|#\w*|http:\/\/[\w\/\.?=]*\w)/<a>$1<\/a>/g

I think this will do. It won't match @..., #... or http... if there is a number or a letter before it, which will keep e-mails away. Please test against an input set, and report any failure, so I can adequate it.

The URL, in particular, is rather tough. Right now I'm erring on the conservative side. It could be changed to stop only at dot, space and parenthesis, if you prefer.

Now, as for different links for @, # and http... you have to use a function for the replacement, as shown in the other answer.

With three passes, just do:

s/(?<!\w)(http:\/\/[\w\/\.?=]*\w)/<a href="c">$1<\/a>/g
s/(?<!\w)(#\w*)/<a href="b">$1<\/a>/g
s/(?<!\w)(@\w*)/<a href="a">$1<\/a>/g
Daniel
thanks for the reply -- sorry but i forgot about the href's. how would i make it so @__, #__, links have different href's in their anchor tags?
inktri
you really should use the function replacement. it provides greater control so you can be sure of what you are replacing.
geowa4
How would I use s/(?<!\w)(http:\/\/[\w\/\.?=]*\w)/<a href="c">$1<\/a>/getc? Is that the first parameter of replace? Say I've got a variable called str
inktri
See the other answer. That's Javascript (I assume). Mine is just the pattern. Though, personally, I think my pattern is superior. You can use my pattern with George's code, though.
Daniel
@Daniel: sounds like a good compromise :-)
geowa4
+2  A: 
var sample = "@sample";
sample = sample.replace(/[^\s+-+.+](@\w+|#\w+|http://[\w\./]+)[$\s+-+.+]/g, "<a>$1</a>");

$1 inserts the matched string.

Using functions (which I recommend for your particular situation):

var sample = "@sample";
sample = sample.replace(/[^\s+-+.+](@\w+|#\w+|http://[\w\./]+)[$\s+-+.+]/g, function(str) {
    var href="";
    if(str.indeoxOf("#") !== -1)
        href=str;
    else if(str.indexOf("@") !== -1)
      ...
    return "<a href="+href+">"+str+"</a>";
});

Using functions is a good idea when you want to have greater or finer control. This way is easier if you want the links to have different href's in their anchor tags.

See more here at MDC.

geowa4
\w won't get the slashes and dots that can appear in an URL. Also, you forgot $1. You'll remove things with that replacement.
Daniel
i was using $2 before, which would have matched. i modified to make the regexp more efficient.
geowa4
\b won't work, because @ and # _are_ word boundaries. I think the negative look-behind is the best way to go.
Daniel
Yuck! javascript doesn't have look-behind? How awful!
Daniel
won't \b still work since @ and # are specified in the grouping?
geowa4
for this:var sample = "@sample";sample = sample.replace(/[^\s+-+.+](@\w+|#\w+|http://[\w\./]+)[$\s+-+.+]/g, function(str) { var href=""; if(str.indeoxOf("#") !== -1) href=str; else if(str.indexOf("@") !== -1) ... return "<a href="+href+">"+str+"</a>";}i get an unterminated parenthetical error in firebug?
inktri
yeah, i forgot to close the replace function.
geowa4
hmm i'm still getting unterminated parenthetical error under firebug. is there a typo in the regex pattern?
inktri