views:

58

answers:

3

hi to all

i am using php and i am having problem to parse the href from anchor tag with text.

example: anchor tag having test http://www.test.com

like this <a href="http://www.test.com" title="test">http://www.test.com&lt;/a&gt;

i want to match all text in anchor tag

thanks in advance.

A: 

Assuming you wish to select the link text of an anchor link with that href, then something like this should work...

$input = '<a href="http://www.test.com" title="test">http://www.test.com&lt;/a&gt;';
$pattern = '#<a href="http://www\.test\.com"[^&gt;]*&gt;(.*?)&lt;/a&gt;#';

if (preg_match($pattern, $input, $out)) {
    echo $out[1];
}

This is technically not perfect (in theory > can probably be used in one of the tags), but will work in 99% of cases. As several of the comments have mentioned though, you should be using a DOM.

Cags
A: 

If you have already obtained the anchor tag you can extract the href attribute via a regex easily enough:

<a [^>]*href="([^"])"[^>]*>

If you instead want to extract the contents of the tag and you know what you are doing, it isn't too hard to write a simple recursive descent parser, using cascading regexes, that will parse all but the most pathological cases. Unfortunately PHP isn't a good language to learn how to do this, so I wouldn't recommend using this project to learn how.

So if it is the contents you are after, not the attribute, then @katrielalex is right: don't parse HTML with regex. You will run into a world of hurt with nested formatting tags and other legal HTML that isn't compatible with regular expressions.

Recurse
+4  A: 

Use DOM:

$text = '<a href="http://www.test.com" title="test">http://www.test.com&lt;/a&gt; something else hello world';
$dom = new DOMDocument();
$dom->loadHTML($text);

foreach ($dom->getElementsByTagName('a') as $a) {
    echo $a->textContent;
}

DOM is specifically designed to parse XML and HTML. It will be more robust than any regex solution you can come up with.

Daniel Egeberg
Not that there's anything "wrong" with how you did it, why didn't you just use `DomElement::getElementsByTagName()` instead of the XPath query? It should be more efficient for that simple path...
ircmaxell
@ircmaxell: Not sure. I've updated it to do that instead.
Daniel Egeberg