views:

31

answers:

2

I have to extract this particular HTML using PHP. Since I haven't any class or unique ID I tried to use his bgcolor attrib but without success...

<td bgcolor="#F5EC97" width="154" valign="top" align="left" height="55">
             <font face="Verdana, Arial, Helvetica, sans-serif" size="1"><b><font color="#CC6633">CITY</font></b><br>
              <b>xyz</b><br>
              xyz<br>
              Tel. 555/22327<br>
              &nbsp;

    </td>

this is the code I've tried:

$res = $html->find('td[bgcolor=#F5EC97]');

Any suggestion?

+1  A: 

Parse into a DOMDocument:

$doc= new DOMDocument();
$doc->loadHTML($html);

Then pick the element(s), either with plain DOM getElementsByTagName:

foreach ($doc->getElementsByTagName('td') as $td) {
    if ($td->getAttribute('bgcolor')=='#F5EC97') {
        // do something with $td
    }
}

Or with XPath:

$xpath= new DOMXpath($doc);
foreach ($xpath->query("//td[@bgcolor='#F5EC97']") as $td) {
   // do something with $td
}
bobince
thanx bobince, using DOMDocument I discovered the lowercase issue...
cesko80
A: 

finally got it...

It does work also with simple_html_dom, just use always lowercase in html color code ex: #f5ec97. NOT working using uppercase, even if in the original document color code is uppercase.

<?php

    require_once("simple_html_dom.php");

    $html = file_get_html('pharma/w_43.htm');
    foreach($html->find('td[bgcolor=#f5ec97]') as $article){
        echo $article->innertext; 

    }

?>

cesko80
Oh! So it's simple_html_dom... I *did* wonder where you got `find()` from. This seems like a bug to me.
bobince