views:

13

answers:

2

I'm using this example to fetch links from a website :

http://www.merchantos.com/makebeta/php/scraping-links-with-php/

$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

for ($i = 0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    var_dump($href);
    $url = $href->getAttribute('href');
    echo "<br />Link stored: $url";
}

It works well; getting all the links; but I cannot get the actual 'title' of the link; for example if i have :

<a href="www.google.com">Google</a>

I want to be able to fetch 'Google' term too.

I'm little lost and quite new to xpath.

+1  A: 

Try this:

$link_title = $href->nodeValue;
antyrat
+1  A: 

You are looking for the "nodeValue" of the Textnode inside the "a" node. You can get that value with

$title = $href->firstChild->nodeValue;

Full working example:

<?php
$dom = DomDocument::loadHTML("<html><body><a href='www.test.de'>DONE</a></body></html>");

$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

for ($i = 0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    $url = $href->getAttribute('href');
    $title = $href->firstChild->nodeValue;
    echo "<br />Link stored: $url $title";
}

Prints:


Link stored: www.test.de DONE

edorian