tags:

views:

96

answers:

3

So let's say the HTML looks something like this:

<select name="some_name">
    <option value="1">1</option>
    <option value="2">2</option>
    <option value="3" selected="selected">3</option>
    <option value="4">4</option>
</select>

I need to extract the option tag with attribute selected="selected" from there. How can I do that? So far I have this:

$string = file_get_contents('test.html');

include 'htmlpurifier-4.0.0-standalone/HTMLPurifier.standalone.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Doctype', 'XHTML 1.0 Strict');
$purifier = new HTMLPurifier($config);
$string = $purifier->purify($string);

$dom = new DOMDocument();
$dom->loadHTML('<?xml encoding="UTF-8">' . $string);
$dom->preserveWhiteSpace = false;

$num = 0;

$optionTags = $dom->getElementsByTagName('option');
foreach ($optionTags as $o) {
    if ($o->hasAttribute('selected')
        && 'selected' === $o->getAttribute('selected')) {
        $num = $o->nodeValue;
    }
}

echo $num;

And that doesn't work. The $num is still equal to zero afterwards.

+2  A: 

How about using simplexml and XPath selectors?

$xml = new SimpleXMLElement($htmlString);
$result = $xml->xpath('//option[@selected="selected"]');

$option = array_pop($result);
var_dump($option);

(tested, working on PHP 5.3.0)

Juraj Blahunka
+1  A: 

I believe it does not work because you are forgetting to access the DOMNodeList's item using its property item.

Try this approach, loop through the entire length of the DOMNodeList returned. Checked if the DOMNode at the current item's index has an attribute named "selected"

$num = 0;
$optionTags = $dom->getElementsByTagName('option');
for ($i = 0; $i < $optionTags->length; $i++ ) {
 if ($optionTags->item($i)->hasAttribute('selected') 
           && $optionTags->item($i)->getAttribute('selected') === "selected") {
     $num = $optionTags->item($i)->nodeValue;
 }
}

EDIT:

My exact code:

$dom = new DOMDocument();
$dom->load("C:\\test.htm");
$num = 0;
$optionTags = $dom->getElementsByTagName('option');
for ($i = 0; $i < $optionTags->length; $i++ ) {
  if ($optionTags->item($i)->hasAttribute('selected') 
         && $optionTags->item($i)->getAttribute('selected') === "selected") {
       $num = $optionTags->item($i)->nodeValue;
  }
}
echo "Num is " . $num;

Output:

Num is 3

Anthony Forloney
Still doesn't work, even if I use your code.
Richard Knop
Try with `==` instead of `===` see if that works, if not, ill run a test and see if I can get it to work
Anthony Forloney
I tried that still nothing. By the way, the exact HTML is from this page: http://www.futbalvsfz.sk/sutaze/sezona-2009-2010/dospeli/5.liga-jz
Richard Knop
When I do echo $optionTags->length; it prints 0... so that's why the loop doesn't run even once.
Richard Knop
I would put `var_dump` to see the contents of `$string` and `$dom`, something is not working. I haven't used `HTMLPurifier`, what is the reason for using that?
Anthony Forloney
Ok I figured this out. The HTMLPurifier is stripping the whole <select> for some reason even though it seems to be valid XHTML.
Richard Knop
+1  A: 

Your next step in debugging is to verify that $string contains the value you expect. The original code posted is correct.

chris