tags:

views:

402

answers:

1

I am trying to pull an exact table during a "web scrape." Used cURL to pull page into $html, which succeeds fine.

Used Firebug to get exact XPATH to the table needed.

Code follows:

$dom = new DOMDocument($html);
$dom->loadHTML($html);

$xpath = new DOMXpath($dom);
$summary = $xpath->evaluate('/html/body/table[5]/tbody/tr/td[3]/table/tbody/tr[8]/td/table');
echo "Summary Length: " . $summary->length;

When executed, $summary->length is always zero. It doesn't pull that table node.

Any ideas?

+1  A: 

Firefox is liable to insert "virtual" tbody elements into tables that don't have them; do those elements exist in the original file?

Rob Kennedy
No, they don't. But I do see them in firefox.I have used XPath Checker as well and can see the data I need. But using it in my PHP xpath->evaluate never returns data.
<tr> is not allowed inside <table> directly - there has to be a <tbody> / <thead> / <tfoot>. It's implied if not specified directly. HTML is weird like that... the start and end tags can both be optional!
Greg
If the the tbody elements don't exist in the original file, then they shouldn't be in your PHP xpath query.
Frank Farmer
I apologize. The TBODY tags are there. I overlooked them when first looking at the source.