Firstly, www.domain.com
isn't a URL, it's a hostname, and
<a href="www.domain.com">
won't work — it'll look for a .com
file called www.domain
relative to the current page.
It's not possible to highlight hostnames in the general case because almost anything can be a hostname. You could try to highlight ‘www.something.dot.separated.words’, but it's not really that reliable and there are many sites that don't use the www.
hostname prefix. I'd try to avoid that.
/\bhttps?:\/\/[^\s<>"`{}|\^\[\]\\]+/;
This is an very liberal pattern you could use as a starting point for detecting HTTP URLs. Depending on what sort of input you've got you may want to narrow down what it allows, and it may be worth detecting trailing characters like .
or !
that would be valid parts of the URL but in practice generally aren't.
(You could use a |
to allow either the URL syntax or the www.hostname
syntax, if you like.)
Anyhow, once you've settled on your preferred pattern you'll need to find that pattern in text nodes on the page. Don't run the regexp over innerHTML
markup. You'll end up completely ruining the page by trying to mark up every href="http://something"
that's already inside markup. You'll also destroy any existing JavaScript references, events or form field values when you replace the innerHTML
content.
In general regexp simply cannot process HTML in any reliable way. So take advantage of the fact that the browser has already parsed the HTML into elements and text nodes, and just look at the text nodes. You'll also want to avoid looking inside <a>
elements, since marking up a URL as a link when it's already in a link is silly (and invalid).
// Mark up `http://...` text in an element and its descendants as links.
//
function addLinks(element) {
var urlpattern= /\bhttps?:\/\/[^\s<>"`{}|\^\[\]\\]+/g;
findTextExceptInLinks(someelement, urlpattern, function(node, match) {
node.splitText(match.index+match[0].length);
var a= document.createElement('a');
a.href= match[0];
a.appendChild(node.splitText(match.index));
node.parentNode.insertBefore(a, node.nextSibling);
});
}
// Find text in descendents of an element, in reverse document order
// pattern must be a regexp with global flag
//
function findTextExceptInLinks(element, pattern, callback) {
for (var childi= element.childNodes.length; childi-->0;) {
var child= element.childNodes[childi];
if (child.nodeType===1) {
if (child.tagName.toLowerCase()!=='a')
findTextExceptInLinks(child, pattern, callback);
} else if (child.nodeType===3) {
var matches= [];
var match;
while (match= pattern.exec(child.data))
matches.push(match);
for (var i= matches.length; i-->0;)
callback.call(window, child, matches[i]);
}
}
}