I have a wysiwyg text area in a Java webapp. Users can input text and style it or paste some already HTML-formatted text.
What I am trying to do is to linkify the text. This means, converting all possible URLs within text, to their "working counterpart", i.e. adding < a href="...">...< /a>.
This solution works when all I have is plain text:
String r = "http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";
Pattern pattern = Pattern.compile(r, Pattern.DOTALL | Pattern.UNIX_LINES | Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(comment);
comment = matcher.replaceAll("<a href=\"$0\">$0</a>"); // group 0 is the whole expression
But the problem is when there is some already formatted text, i.e. that it already has the < a href="...">...< /a> tags.
So I am looking for some way for the pattern not to match whenever it finds the text between two HTML tags (< a>). I have read this can be achieved with lookahead or lookbehind but I still can't make it work. I am sure I am doing it wrong because the regex still matches. And yes, I have been playing around/ debugging groups, changing $0 to $1 etc.
Any ideas?