tags:

views:

131

answers:

1

Hey guys, I have the following HTML structure that I am trying to pull information from:

// Product 1
<div class="productName">
 <span id="product-name-1">Product Name 1</span>
</div>

<div class="productDetail">            
 <span class="warehouse">Warehouse 1, ACT</span>                
 <span class="quantityInStock">25</span>
</div>

// Product 2
<div class="productName">
 <span id="product-name-2">Product Name 2</span>
</div>

<div class="productDetail">            
 <span class="warehouse">Warehouse 2, ACT</span>                
 <span class="quantityInStock">25</span>
</div>

…

// Product X
<div class="productName">
 <span id="product-name-X">Product Name X</span>
</div>

<div class="productDetail">            
 <span class="warehouse">Warehouse X, ACT</span>                
 <span class="quantityInStock">25</span>
</div>

I don't have control of the source html and as you'll see productName and it's accompanying productDetail are not contained within a common element.

Now, I am using the following php code to try and parse the page.

$html = new DOMDocument();
$html->loadHtmlFile('product_test.html');

$xPath = new DOMXPath($html);

$domQuery = '//div[@class="productName"]|//div[@class="productDetail"]';

$entries = $xPath->query($domQuery);

foreach ($entries as $entry) { 
 echo "Detail: " . $entry->nodeValue) . "<br />\n";
}

Which prints the following:

Detail: Product Name 1
Detail: Warehouse 1, ACT
Detail: 25
Detail: Product Name 2
Detail: Warehouse 2, ACT
Detail: 25
Detail: Product Name X
Detail: Warehouse X, ACT
Detail: 25

Now, this is close to what I want. But I need to do some processing on each Product, Warehouse and Quantity stock and can't figure out how to parse it out into separate product groups. The final output I am after is something like:

Product 1:
Name: Product Name 1
Warehouse: Warehouse 1, ACT
Stock: 25

Product 2:
Name: Product Name 2
Warehouse: Warehouse 2, ACT
Stock: 25 

I can't just figure it out, and I can't wrap my head around this DOM stuff as the elements don't quite work the same as a standard array.

If anyone can assist, or point me in the right direction I will be ever appreciative.

A: 

Maybe not the most efficient way but

$html = new DOMDocument();
$html->loadHtmlFile('test2.php');

$xPath = new DOMXPath($html);

foreach( $xPath->query('//div[@class="productName"]') as $prodName ) { 
  $prodDetail = $xPath->query('following-sibling::div[@class="productDetail"][1]', $prodName);
  // <-- todo: test if there is one item here -->
  $prodDetail = $prodDetail->item(0);
  echo "Name: " . $prodName->nodeValue . "<br />\n";
  echo "Detail: " . $prodDetail->nodeValue . "<br />\n";
  echo "----\n";
}

prints

Name: 
 Product Name 1
<br />
Detail:             
 Warehouse 1, ACT                
 25
<br />
----
Name: 
 Product Name 2
<br />
Detail:             
 Warehouse 2, ACT                
 25
<br />
----
Name: 
 Product Name X
<br />
Detail:             
 Warehouse X, ACT                
 25
<br />
----
VolkerK
Thanks, this is a start in the right direction. With my example code things work correctly. However my production code has some additional divs, and spans between productName and productDetail which this code seems to break on.Am I right in assuming that following-sibling simply looks at the very next element after the initial match, and expects "productDetail" to be next and nothing else? Which will break when the next element is garbage?
Michael Pasqualone
That's strange because I've tested the script with additional divs. The axis `following-siblings` contains all nodes that are siblings following the context node in document-order. Maybe you should provide a more detailed, real-world example of your document.
VolkerK
All good mate! Was my lazy/sleepy mistake, anyways - you're code is working beautifully! Thanks a lot!!
Michael Pasqualone