var matches = [];
input_content.replace(/[^<]*(<a href="([^"]+)">([^<]+)<\/a>)/g, function () {
matches.push(Array.prototype.slice.call(arguments, 1, 4))
});
This assumes that your anchors will always be in the form <a href="...">...</a>
i.e. it won't work if there are any other attributes (for example, target
). The regular expression can be improved to accommodate this.
To break down the regular expression:
/ -> start regular expression
[^<]* -> skip all characters until the first <
( -> start capturing first token
<a href=" -> capture first bit of anchor
( -> start capturing second token
[^"]+ -> capture all characters until a "
) -> end capturing second token
"> -> capture more of the anchor
( -> start capturing third token
[^<]+ -> capture all characters until a <
) -> end capturing third token
<\/a> -> capture last bit of anchor
) -> end capturing first token
/g -> end regular expression, add global flag to match all anchors in string
Each call to our anonymous function will receive three tokens as the second, third and fourth arguments, namely arguments[1], arguments[2], arguments[3]:
- arguments[1] is the entire anchor
- arguments[2] is the href part
- arguments[3] is the text inside
We'll use a hack to push these three arguments as a new array into our main matches
array. The arguments
built-in variable is not a true JavaScript Array, so we'll have to apply the split
Array method on it to extract the items we want:
Array.prototype.slice.call(arguments, 1, 4)
This will extract items from arguments
starting at index 1 and ending (not inclusive) at index 4.
var input_content = "blah \
<a href=\"http://yahoo.com\">Yahoo</a> \
blah \
<a href=\"http://google.com\">Google</a> \
blah";
var matches = [];
input_content.replace(/[^<]*(<a href="([^"]+)">([^<]+)<\/a>)/g, function () {
matches.push(Array.prototype.slice.call(arguments, 1, 4));
});
alert(matches.join("\n"));
Gives:
<a href="http://yahoo.com">Yahoo</a>,http://yahoo.com,Yahoo
<a href="http://google.com">Google</a>,http://google.com,Google