Shameless plug: You can look here (regular expression replace a word by a link) for inspiration.
The question asked to replace some word with a certain link, unless there already was a link. So the problem you have is more or less the same thing.
All you need is a regex that matches a URL (in place of the word). The simplest assumption would be like this: An URL (optionally) starts with "http://"
, "ftp://"
or "mailto:"
and lasts as long as there are no white-space characters, line breaks, tag brackets or quotes).
Beware, long regex ahead. Apply case-insensitively.
(href\s*=\s*['"]?)?((?:http://|ftp://|mailto:)?[^.,<>"'\s\r\n\t]+(?:\.(?![.<>"'\s\r\n])[^.,!<>"'\s\r\n\t]+)+)
Be warned - this will also match URLs that are technically invalid, and it will recognize things.formatted.like.this as an URL. It depends on your data if it is too insensitive. I can fine-tune the regex if you have examples where it returns false positives.
The regex will produce two match groups. Group 2 will contain the matched thing, which is most likely an URL. Group 1 will either contain an empty string or an 'href="'
. You can use it as an indicator that this match occurred inside a href parameter of an existing link and you don't have to do touch that one.
Once you confirm that this does the right thing for you most of the time (with user supplied data, you can never be sure), you can do the rest in two steps, as I proposed it in the other question:
- Make a link around every URL there is (unless there is something in match group 1!) This will produce double nested
<a>
tags for things that have a link already.
- Scan for incorrectly nested
<a>
tags, removing the innermost one