ansaurus

Question

Linkify text with regular expressions in Java

Answer 1

A:

Perhaps html parsing will be more appropriate for you (htmlparser for example). Then you could have html nodes and only "linkify" links in the text and not in the attributes.

kgiannakakis 2009-03-10 11:40:52

Answer 2

A:

If you have to roll your own, at least look at the algorithms/patterns used in an Open Source implementation of Markdown, e.g., MarkdownJ.

Hank Gay 2009-03-10 11:42:00

Answer 3

+1 A:

If you want to use regex, (though I think parsing to XML/HTML first is more robust) I think look-ahead or -behind makes sense. A first stab might be to add this at the end of your regex:

(?!</a>)

Meaning: don't match if there's a closing a tag just afterwards. (This could be tweaked forever, of course.) This doesn't work well, though, because given the string

<a href="...">http://example.com/&lt;/a&gt;

This regex will try to match "http://example.com/", fail due to the lookahead (as we hope), and then backtrack the greedy qualifier to have on the end and match "http://example.com" instead, which doesn't have a after it.

You can fix the latter problem by using a possessive qualifier on your +, * and ? operators - just stick a + after them. This prevents them from back-tracking. This is probably good for performance reasons, as well.

This works for me (note the three extra +'s):

String r = "http(s)?://([\\w+?\\.\\w+])++([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*+)?+(?!</a>)";

Jesse Rusak 2009-03-10 11:47:27

thanks, i will consider this improvement

frank06 2009-03-10 13:09:52

Answer 4

+7 A:

You are close. You can use a "negative lookbehind" like so:

(?<!href=")http:// etc

All results preceded by href will be ignored.

Kees de Kooter 2009-03-10 11:49:44

thanks, it was exactly this i was needing... i was very close indeed!

frank06 2009-03-10 13:02:39

I always carry the "Regular Expression Pocket Reference" with me ;-)

Kees de Kooter 2009-03-10 13:30:46

Answer 5

+1 A:

If you really want to do it with regex, than:

   String r = "(?<![=\"\\/>])http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";

e.g. check that the URL is not following a =" or />

siddhadev 2009-03-10 11:58:42

ansaurus

tags:

views:

answers:

Linkify text with regular expressions in Java

related questions