domxpath

can DOMXPath::query be limited to a certain depth?

Is there a way to limit the depth DOMXPath::query will look at? Consider the following document: <div> <span> <div> </div> </span> </div> How could I limit the query //div So it only matches the first level and not the descendants? ...

Why doesn't DOMXPath work?

I've been trying to write a PHP script to parse an XML document using DOMXPath; however it seems I'm missing something because none of my XPath queries are returning anything. So I've tried to water down my script to try and parse a very rudimentary XML document, and that's not working either. I've based this script off of this XPath exa...

Limit to number of items returned from DOMXPath query?

I'm having an issue where I am trying to pull 1700+ anchors off a webpage using a DOMXPath query. However the DOM Node List length is returned as 1400. I have done the same (with added tbody's) in XPather in FF and it returns 1700+ anchors so I know the query is right. Is there a limit to how many nodes xpather can return? Or how muc...

XPath to get one level of childnodes

Using DOMXPath::query is it possible to get just one level deep of childNodes? For example if I had a document like: <div> <span> <cite> </cite> </span> <span> <cite> </cite> </span> </div> I would want the NodeList to contain just the spans and not the cites. Should also mention that ...

PHP Xpath following-siblings

I'm trying to use xpath to get the contents of a table. The table looks like this <div> <table> <tr class="tableheader"> <td> Stuff </td> </tr> <tr class="indent1"> <td> Contents </td> </tr> <tr class="indent1"> <td> Contents </td> </tr> <tr class="tableheader"> <td> Stuff </td> </tr> <tr class="indent1"> <td> C...

PHP's DOMXPath is stripping out my tags inside the matched text.

I asked this question yesterday, and at the time it was just what I needed, but while working with some live data I discovered that is wasn't quite doing what I expected. http://stackoverflow.com/questions/2571232/parse-html-with-phps-html-domdocument It gets the data from the HTML page, but then it also strips out all the HTML tags ins...

PHP DomDocument, DomXPath encoding issue

Hi, I'm having a problem with encoding from a wordpress feed that I just can't seem to figure out. I was loading my feed with DOMDocument->load but then did a file_get_contents and am now using ->XMLload with the same results. I did the XMLload so I could manipulate the feed if needed. The correct output that I'm looking for is - ‘ £....

Xpath php fetch links

I'm using this example to fetch links from a website : http://www.merchantos.com/makebeta/php/scraping-links-with-php/ $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); var_dump($href); $url = $href->getAttribute('href'); echo "<...

PHP DOMXPATH & Array

I'm trying to extract all relevant URLs and images out of a page and put them into an array, the code below works fine except it outputs the first pair over and over for the numerically-correct number of times. I thought maybe I was making mistakes when specifying XPATHs but I've tested it on 3 different sites with the same result every ...

DomXML xpath what do I do next?

I have this code: $reader = new DOMDocument(); $reader->loadHTML($shell); $xpath = new DomXPath($reader); $xpath->registerNamespace('html','http://www.w3.org/1999/xhtml'); $res = $xpath->query('descendant-or-self::*[contains(@class,"content")]'); print_r($res); $shell is just a variable containing the following html code: <html xmln...

PHP problem with DOM parsing

Hi all. The code pasted below, works on my PC, but not on my hosting (which have PHP 5.2.13 installed). $source = file_get_contents('http://example.com/example', 0); $dom = new DOMDocument; @$dom->loadHTML($source); $dom->preserveWhiteSpace = false; $xpath = new DOMXPath($dom); $tags = $xpath->query('//div[@class="item"]'); $xml = '<...

Display one XML entry based on id

Trying to make a quick and dirty news system. Have a basic XML file. <?xml version="1.0" encoding="ISO-8859-1"?> <articles> <article id="1"> <title>Article title 001</title> <short>Short text</short> <long>Long text</long> </article> <article id="2"> <title>Article title 002</title> <short>Short text</short> <lo...

DOMXpath | Select the innermost divs

Im looking for a way to select the innermost div with PHP for example: <div> <div> <div> - </div> </div> <div> <div> <div> - </div> </div> </div> </div> The DIV's containing the - would be selected in the NodeList Im using DOMDocument...

Debugging of object of DomXPath and DomDocument.

We use echo or print_r to get value of variables while debugging PHP code. But, object of domXPath or DomDocument are not captured in echo or print_r. How to get values from above objects while debugging PHP code? ...

PHP DOMDucment and DOMXpath seem to be broken. Help Please (FIXED)

Hello All, So here is my problem in a nut shell. I'm working on a web scraping app for work and I've ran into a snag. I'm trying to load the HTML markup of a site using CURL, then use DOMDocument and XPath to find specific node values from that HTML. Initially, the user plugs in a URL which displays information they want to pull out o...

Get data from specific HTML table cells using Php

I need to get the data out of all of the table cells in the 4th row of the 4th table on an HTML page. After researching for a while, it seems that using DOMXPath is the best way to parse the HTML file. However, no IDs or classes are used anywhere in the file. What would be the best way to get the data out of these cells? Thanks in advan...