views:

316

answers:

3
+1  Q: 

PHP Scraping Page

I'm trying to scrape a page where the information I'm looking for lies within:

 <tr class="defRowEven">
   <td align="right">label</td>
   <td>info</td>
 </tr>

I'm trying to get the label and info out of the page. Before I was doing something like:

$hrefs = $xpath->evaluate("/html/body//a");

That is how I'm grabbing the URL's. Is there a way to grab that tr information? Would it be better to use regex or using the DOMXPath? I'm very unfamiliar with DOMXPath and any information would be more than helpful. Thank you!

+1  A: 

I'm not familiar with xpath, but using SimpleHtmlDom you can do this:

foreach($html->find('tr.defRowEven') as $row) {

    //get the 'label' (first cell)
    echo $row->find('td', 0)->innerText;

    //get the 'info' (second cell)
    echo $row->find('td', 1)->innerText;
}
karim79
Tried that, no luck. Just a blank screen. I'll keep working with the class though, Thank you!
Frederico
@Frederico - maybe try echo $row->find('td', 0)->plainText; instead
karim79
A: 

Someone here recently at SO gave a link to phpQuery .. a kind of jQuery for php/server-side .. which SHOULD make this kinda thing easy. I've not tried it so can't comment first hand

Scott Evernden
+4  A: 

XPath can select based on attributes. To find your row, then, use:

$rows = $xpath->query("//tr[@class='defRowEven']");

This should return a list of rows, so you can select the label and info for each without mixing them up:

foreach ($rows as $row) {
    $label = $xpath->evaluate("td[@align='right']", $row);
    $info = $xpath->evaluate("td[2]", $row);
}

In case that doesn't work out, you can try the regex route:

preg_match_all('/<tr class="defRowEven">\s*<td align="right">(.*?)<\/td>\s*<td>(.*?)<\/td>/',
    $html, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    list($full, $label, $info) = $match;
}
eswald
Tried your 2nd example, and couldn't get that working. I'll keep trying though. Thank you!
Frederico