ansaurus

Question

Answer 1

A:

Assuming you wish to select the link text of an anchor link with that href, then something like this should work...

$input = '<a href="http://www.test.com" title="test">http://www.test.com&lt;/a&gt;';
$pattern = '#<a href="http://www\.test\.com"[^&gt;]*&gt;(.*?)&lt;/a&gt;#';

if (preg_match($pattern, $input, $out)) {
    echo $out[1];
}

This is technically not perfect (in theory > can probably be used in one of the tags), but will work in 99% of cases. As several of the comments have mentioned though, you should be using a DOM.

Cags 2010-07-29 10:09:14

Answer 2

A:

If you have already obtained the anchor tag you can extract the href attribute via a regex easily enough:

<a [^>]*href="([^"])"[^>]*>

If you instead want to extract the contents of the tag and you know what you are doing, it isn't too hard to write a simple recursive descent parser, using cascading regexes, that will parse all but the most pathological cases. Unfortunately PHP isn't a good language to learn how to do this, so I wouldn't recommend using this project to learn how.

So if it is the contents you are after, not the attribute, then @katrielalex is right: don't parse HTML with regex. You will run into a world of hurt with nested formatting tags and other legal HTML that isn't compatible with regular expressions.

Recurse 2010-07-29 10:09:41

Answer 3

+4 A:

Use DOM:

$text = '<a href="http://www.test.com" title="test">http://www.test.com&lt;/a&gt; something else hello world';
$dom = new DOMDocument();
$dom->loadHTML($text);

foreach ($dom->getElementsByTagName('a') as $a) {
    echo $a->textContent;
}

DOM is specifically designed to parse XML and HTML. It will be more robust than any regex solution you can come up with.

Daniel Egeberg 2010-07-29 10:10:07

Not that there's anything "wrong" with how you did it, why didn't you just use `DomElement::getElementsByTagName()` instead of the XPath query? It should be more efficient for that simple path...

ircmaxell 2010-07-29 10:18:12

@ircmaxell: Not sure. I've updated it to do that instead.

Daniel Egeberg 2010-07-29 10:21:21

ansaurus

tags:

views:

answers:

regular expression anchor tag

related questions