views:

156

answers:

5

Related (but slightly different):

http://stackoverflow.com/questions/1181204/javascript-regex-surround-and-http-with-anchor-tags-in

I would like to surround all instances of @___, #____, and http://________ with anchor tags. Multiple passes is fine with me.

For example, consider this Twitter message:

The quick brown fox @Spreadthemovie jumps over the lazy dog #cow, http://bit.ly/bC9Dy

Running it with the desired regex pattern would yield:

The quick brown fox <a href="blah/Spreadthemovie">@Spreadthemovie</a> jumps over the lazy
dog <a href="blah/cow">#cow</a>, <a href="blah/http://bit.ly/bC9Dy"&gt;http://bit.ly/bC9Dy&lt;/a&gt;

Only surround words that start with @, # or http:// so that [email protected] would not become [email protected]. Also, note how "#cow," turned into "<a href=urlB>#cow</a>," ... I only want alpha-numeric characters to be on the end of each anchor tagged substring. Also notice the href attribute.

If possible, please include actual javascript code with the regex pattern and replace function.

Many thanks! This problem has been plaguing me for a while

A: 

For matching @ and # tags, I'd suggest using the \w metapattern (matches word characters - so it'll match digits and letters, but not whitespace/punctuation). Thus, you'd want something like the following patterns to pull out the matched items:

(@\w+)
(#\w+)

For matching URLs, a simple but naive pattern would be to just match http:// followed by any non-whitespace:

(http://\S+)

However, there are certain characters not valid in URLs that would get captured by this. A more sophisticated pattern that only allows characters which are valid in URLs would be the following:

(http://[a-zA-Z0-9+$_.+!*'(),#/-]+)
Amber
+1  A: 
Thinker
+1  A: 
str.replace(
    /(\s|^)([#@])([\w\d]+)|(http:\/\/\S+)/g,
    '$1<a href="$3$4">$2$3$4</a>'
);
J-P
i made a mistake in the urls.. how would I do the following? what do i need to chnage the expression to?1. @blah urls have href "http://twitter.com/blah"2. #blah urls have href "/web#q=blah"3. http://blah urls have href "blah"thanks!
inktri
regular expression still make my head twitch, 3 hoorays if its correct ;)
PoweRoy
A: 

Here is a revised answer based on the revised question. You should actually put the revision/comment on the original question.

It uses 3 patterns for 3 actions and chains them. It uses the word boundary pattern (\b\B) as appropriate instead of (^|\s). This picks up patterns separated by punctuation and no space, eg @tweet,#tag

<script type=text/javascript>
function addTags(str) {
    return str.replace(/\B(@)(\w+)/g, '<a href"//twitter.com=/$2">$1$2</a>')
              .replace(/\B(#)(\w+)/g, '<a href="web#q=$2">$1$2</a>')
              .replace(/\b(http:\S+[^,.])/g, '<a href="$1">$1</a>')
              ;
}
function testTags() {
    document.getElementById('outstr').innerHTML =
    document.getElementById('outtxt').innerHTML =
     addTags(document.getElementById('instr').value);
}
</script>
<input type=text size=100 id="instr" value="@begin [email protected] and then #cow to http://mysite.com and also http://yoursite.com."&gt;&lt;br&gt;
<p><textarea id="outtxt" cols=90></textarea>
<p id=outstr></p>
<p><button onclick="testTags();">TEST</button>

I tested it with the above.

Lucky
You could still improve the url matching if you wanted, by replacing \S with a pattern that only matched the characters you wanted to allow (for example allow or disallow query strings or %xx)
Lucky
i made a mistake in the urls.. how would I do the following? what do i need to chnage the expression to?1. @blah urls have href "http://twitter.com/blah"2. #blah urls have href "/web#q=blah"3. http://blah urls have href "blah"thanks!
inktri
Modified to new specs. My solution adds "//" to the twitter url - should probably add "http://", unless you really want it to go to a url on your site called twitter.com.
Lucky
A: 

One important thing!

Make sure you are aware of the possible risks in doing naive replacement on links.

Do not allow users to insert arbitrary HTML on your site. The name of the XSS game is sanitizing user input. If you stick to a whitelist based approach -- only allow input that you know to be good, and immediately discard anything else -- then you're usually well on your way to solving any XSS problems you might have.

Naïve replacement counts as allowing inserting arbitrary HTML on you site.

At the very least, try to make sure that the resulting <a href=''> does not start with javascipt:, as you'd be open to Cross-Site Request Forgeries.

voyager