ansaurus

Question

PHP: Fetch content from a html page using xpath()

Answer 1

+1 A:

This XPath expression:

//div[@id='content']/p

Result in the wanted node set (five p elements)

EDIT: Now it's clear what is your problem. You need to iterate over the NodeList:

private function GetHTMLFromDom($domNodeList){ 
   $domDocument = new DOMDocument(); 
   foreach ($nodelist as $node) {
      $domDocument->appendChild($domDocument->importNode($node, true)); 
   }
   return $domDocument->saveHTML(); 
}

Alejandro 2010-10-14 18:30:37

@Alejandro: thanks for the answer but //div[@id='content']/p dont works for me, i get only the firts p.

Luciano 2010-10-14 18:49:13

@Luciano: Then the problem lies somewhere else in your code. Try after query this: `echo $domNodeList->length`

Alejandro 2010-10-14 19:15:13

@Alejandro: the number of nodes is right, but i still get the first p only. Could it be an error given by tidy() function. I get the content of the page with curl, but then i parse it with $tidy->parseString($curl_res);$tidy->cleanRepair();return $tidy;Finally i send the this value as $page to domdocument.

Luciano 2010-10-15 11:24:42

@Alejandro: I've tried excluding tidy(), passing to domdocument the content i get with curl, but seems the same thing... is this the right way to use domdocument? (i've updated my question...)

Luciano 2010-10-15 11:47:10

@Luciano: Now with your remaining code it's clear what is your problem. Check my edit.

Alejandro 2010-10-15 13:07:37

@Alejandro: thanks you so much, this was really helpful! I've been settled with the idea that there was an error with xpath... instead the solution is in a messed loop. Thanks again!

Luciano 2010-10-15 16:48:34

@Luciano: I'm glad it was helpfull. I'm not a PHP expert, but I think it would be better to get the document reference as `$domDocument = $domNodeList->item(0)->ownerDocument` and into the iteration block `$strResult += $domDocument->saveHTML($node)`, so then return the `$strResult`.

Alejandro 2010-10-15 17:06:18

ansaurus

tags:

views:

answers:

PHP: Fetch content from a html page using xpath()

related questions