views:

357

answers:

2

I need regular expression that converts links in plain text to HTML links.

Here are the following test links:

http://www.a-domain.com/something/?something
www.a-domain.com/something/?something

The regular expression should also work under the following assumptions:

Anything attached to the URL that isn't a part of the URL (a comma or period, for example) should be ignored. I found this one, but it does not meet all of my needs.

Does anyone have the right regular expression for my needs?

A: 
(http://|www\.)([^\s()[\]<>]+|\([^\s)]*\)|\[[^\s\]]*])+(?<![.,!?])

This handles most cases, but does not try to handle all. (It uses a negative lookbehind assertion at the end; I don't know if your C# or asp.net regex libraries can handle that, but it is an easy way to make it "non-greedy" on those characters at the end.)

You haven't been very explicit about your needs or how the linked regex doesn't meet them; more examples of what should and shouldn't be matched, for you, would clarify, but I think this will help.

Roger Pate
Actually I think that will do :) http://regexlib.com/RETester.aspx is a very nice tester which can test .Net, JavaScript, VBScript. However, I do notice that http:// or www. is in $1 and the rest (www.something.com or something.com) is in $2 which I guess will require me to test whether or not http:// is already there before making the text to a link.
lasseespeholt
Yes, I only structured the groups for matching here, figuring you can adapt it as you need and your regex familiarity allow. I would just extract the complete match and prepend `http://` if it's not there, giving you a consistent form to work with after that.
Roger Pate
+3  A: 

In this blog post, Regex guru Jan Goyvaerts shows a few ways how to go about matching URLs in plain text. He also shows many common pitfalls.

For your case, I'd recommend

\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]

(case-insensitive mode turned on)

Tim Pietzcker
Thanks :) it seems to do some better matches.
lasseespeholt
Hm, `mailto:` is missing. And some other nice things like `gopher:` (jk, though :-)).
Joey
I know; see the above link for a version that also contains `mailto:` (and can be extended for `gopher:` if you like :))
Tim Pietzcker