I'm trying to make a regular expression that will correctly capture URLs, including ones that are wrapped in parenthesis as in (http://example.com) and spoken about on coding horror at http://www.codinghorror.com/blog/archives/001181.html
I'm currently using the following to create HTML A tags in python for links that start with http and www.
r1 = r"(\b(http|https)://([-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]))"
r2 = r"((^|\b)www\.([-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]))"
return re.sub(r2,r'<a rel="nofollow" target="_blank" href="http://\1">\1</a>',re.sub(r1,r'<a rel="nofollow" target="_blank" href="\1">\1</a>',text))
this works well except for the case where someone wraps the url in parens. Does anyone have a better way?