tags:

views:

2288

answers:

5

I just recently read about the DOM module in PHP and now I'm trying to use it for parsing a HTML document. The page said that this was a much better solution than using preg but I'm having a hard time figuring out how to use it.

The page contains a table with dates and X number of events for the date.

First I need to get the text (a date) from a tr with valign="bottom" and then I need to get all the column values from all the tr with valign="top" who is below that tr. I need all the column values from each tr below the tr with the date up until the next tr with valign="bottom" (next date). The number of tr with column data is unknown, can be zero or a lot of them.

This is what the HTML on the page looks like:

<table>
    <tr valign="bottom">
        <td colspan="4">2009-02-26</td>
    </tr>
    <tr valign="top">
        <td>21:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
    <tr valign="top">
        <td>23:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
    <tr valign="bottom">
        <td colspan="4">2009-02-27</td>
    </tr>
    <tr valign="top">
        <td>06:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
    <tr valign="top">
        <td>10:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
    <tr valign="top">
        <td>13:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
</table>

So far I've been able to get the first two dates (I'm only interested in the first two) but I don't know how to go from here.

The xpath query I use to get the date trs is

$result = $xpath->query('//tr[@valign="bottom"][position()<3]);

Now I need a way to connect all the events for that day to the date, ie. select all the tds and all the column values up until the next date tr.

A: 

Use following-sibling().

vartec
Thanks, but how do you tell xpath to only select siblings up to a node with [valign="bottom"]? If I use following-sibling::tr[@valign="top"] on my selected date it'll return all the following trs when I only want the ones up until the next date tr?
Daniel Johansson
Select all nodes that are following siblings of the current tr[@valign="bottom"], but are not following siblings of the next one. For example for the first one:following-sibling::tr[@valign="bottom"][1] and not(following-sibling::tr[@valign="bottom"][2])
vartec
A: 
$oldSetting = libxml_use_internal_errors( true ); 
libxml_clear_errors(); 

$html = new DOMDocument(); 
$html->loadHtmlFile('http://url/table.html'); 

$xpath = new DOMXPath( $html ); 
$elements = $xpath->query( "//table/tr" ); 

foreach ( $elements as $item ) {
  $newDom = new DOMDocument;
  $newDom->appendChild($newDom->importNode($item,true));

  $xpath = new DOMXPath( $newDom ); 

  foreach ($item->attributes as $attribute) { 

    for ($node = $item->firstChild; $node !== NULL; 
         $node = $node->nextSibling) {
      if (($attribute->nodeName =='valign') && ($attribute->nodeValue=='top'))
      {
        print($node->nodeValue); 
      }
      else
      {
        print("<br>".$node->nodeValue);
      }
    }
    print("<br>");
  } 
}

libxml_clear_errors(); 
libxml_use_internal_errors( $oldSetting );
A: 

Awesome work!!

Thanks alot!

Moiz
A: 

How do you save each row into an array?

jon
A: 

This XPath expression

/table/tr/td[@colspan=4]

or

/table/tr[valign='bottom']/td

Result in a node set with date cells.

How to get cells between marks?

/table/tr/td[not(@colspan=4)][preceding::td[@colspan=4][1]='2009-02-26']
Alejandro