ansaurus

Question

Regular Expression to match <a> tags without http://

Answer 1

A:

var html = 'Some text with a <a href="http://example.com/"&gt;link&lt;/a&gt; and an <a href="#anchor">anchor</a>.';
var re = /<a href="(?!http:\/\/)[^"]*">/i;
var match = html.match(re);
// match contains <a href="#anchor">

Note: this won't work if you've additional attributes.

Lekensteyn 2010-09-18 19:36:04

Won't work for `<a href="http.html">` or `<a href="http:foo.html">` (yes, `http:...` does not explicitly imply the HTTP protocol, as all browsers will ignore the "http:" part if there isn't two slashes; `http:/` is equivalent to `/`)

Eli Grey 2010-09-18 20:37:21

Updated it to match literally `http://`. Note that browsers (at least Firefox) expands `//example.com/` to `http://example.com/` (or https, depending on the current protocol).

Lekensteyn 2010-09-19 08:53:39

Answer 2

+6 A:

It's more easy to use a DOMParser and XPath, not a regex.

See my response in jsfiddle.

HTML

<body>
    <div>
        <a href='index.php'>1. index</a>
        <a href='http://www.bar.com'&gt;2. bar</a>
        <a href='http://www.foo.com'&gt;3. foo</a>        
        <a href='hello.php'>4. hello</a>        
    </div>
</body>

JS

$(document).ready(function() {
    var type = XPathResult.ANY_TYPE;
    var page = $("body").html();
    var doc = DOMParser().parseFromString(page, "text/xml");
    var xpath = "//a[not(starts-with(@href,'http://'))]";
    var result = doc.evaluate(xpath, doc, null, type, null);

    var node = result.iterateNext();
    while (node) {
        console.log(node); // returns links 1 and 4
        node  = result.iterateNext();        
    }

});

NOTES

I'm using jquery to have a small code, but you can do it without jquery.
This code must be adapted to work with ie (I've tested in firefox).

Topera 2010-09-18 19:36:22

If you use jQuery, then you might as well use `$("a:not([href^=http://])")` which works in IE.

Peter Ajtai 2010-09-19 21:27:54

Answer 3

+4 A:

You should use a XML parser instead of regexes.

On the same topic :

RegEx match open tags except XHTML self-contained tags

Colin Hebert 2010-09-18 19:36:52

Answer 4

+2 A:

With jquery, You can do something very simple:

links_that_doesnt_start_with_http = $("a:not([href^=http://])")

edit: Added the ://

Nicolas Viennot 2010-09-18 20:21:39

+1 for an alternative that may do what the OP wants (they were quite vague as to the purpose).

Blair McMillan 2010-09-18 20:33:06

`<a href="http.html">Nope.</a>

Eli Grey 2010-09-18 20:34:44

@Eli The `://` part can be added in easily - the technique is essentially correct.

Yi Jiang 2010-09-19 05:35:30

Answer 5

A:

I'm interpreting your question in that you mean any (mostly) absolute URI with a protocol, and not just HTTP. To add to everyone else's incorrect solutions. You should be doing this check on the href:

if (href.slice(0, 2) !== "//" && !/^[\w-]+:\/\//.test(href)) {
    // href is a relative URI without http://
}

Eli Grey 2010-09-18 20:43:07

ansaurus

tags:

views:

answers:

Regular Expression to match <a> tags without http://

related questions