views:

437

answers:

5

I searched high and low but cannot aeem to find a definitve answer to this. As is often the case with regexps. So I thought I'd ask here.

I'm trying to put together a regular expression i can use in JavaScript to replace all instances of URLs and email addresses (does'nt need to be ever so strict) with anchor tags pointing to them.

Obviously this is something usually done very simply on the server-side, but in this case it is necessary to work with plain text so an elegant JavaScript solution to perfom the replaces at runtime would be perfect.

Onl problem is, as I've stated before, I have a huge regular expression shaped gaping hole in my skill set :(

I know that one of you has the answer at the tip of your fingers though :)

+1  A: 

Not a canned solution, but this will point you in the right direction.

I use Regex Coach to build and test my regexes. You can find plentiful examples of regexes for urls and email addresses online.

Chris Ballance
A: 

Well, blindly using regexps from http://www.osix.net/modules/article/?id=586

var emailRegexp = 
   new RegExp(
   '([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}' + 
   '\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.' + 
   ')+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)',
   "gi");

var urlRegex = 
   new RegExp(
   '((https?://)' 
   '?(([0-9a-z_!~*\'().&=+$%-]+: )?[0-9a-z_!~*\'().&=+$%-]+@)?' //user@ 
   '(([0-9]{1,3}\.){3}[0-9]{1,3}' // IP- 199.194.52.184 
   '|' // allows either IP or domain 
   '([0-9a-z_!~*\'()-]+\.)*' // tertiary domain(s)- www. 
   '([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.' // second level domain 
   '[a-z]{2,6})' // first level domain- .com or .museum 
   '(:[0-9]{1,4})?' // port number- :80 
   '((/?)|' // a slash isn't required if there is no file name 
   '(/[0-9a-z_!~*\'().;?:@&=+$,%#-]+)+/?))',
   "gi");

then

text.replace(emailRegexp, "<a href='mailto::$1'>$1</a>");

and

text.replace(urlRegexp, "<a href='$1'>$1</a>");

might to work

Daniel LeCheminant
A: 

As always, this ("this" being "processing HTML with regex") is going to be difficult and error-prone. The following will work on reasonably well-formed input only, but here's what I would do:

  1. find the element you want to process, take it's innerHTML property value
  2. iteratively find everything that already is a link (/(<a\b.+?</a>/ig)
  3. based on that, cut your string into "this isn't a link"- and "this is a link"-bits, appending all of them them to a neatly orderd array
  4. process the "non-link" bits only (those that don't begin with "<a "), looking for URL- or e-mail-address patterns
  5. wrap every address you find in <a> tags
  6. join() the array back to a string
  7. set the innerHTML property to your new value

I am sure you will find regular expression examples that match e-mail addresses and URLs. Take the ones that suit you most, and use them in step 4.).

Tomalak
A: 

Here's a good article for urls...

http://www.codinghorror.com/blog/archives/001181.html

emails are more straight forward since they have to end in a .tld You don't need to get fancy with that one since you're not validating, just matching, so off the top of my head...

[^\s]+@\w[\w-.]*.[a-zA-Z]+

jayrdub
A: 

Just adding a bit of information on email regexps: Most of them seems to ignore that domain names can have the characters 'åäö' in them. So if your care about that, make sure that the solution you are using has åäöÅÄÖ in the domain part of the regexp.

ciscoheat