tags:

views:

237

answers:

6

Say I have data like this:

<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>

Using PHP, how would I sort through the HTML tags, returning all text from within the option values. For instance, given the code above, I'd like to return 'Test - 123', 'Test - 456', 'Test - 789'.

Thanks for the help!

UPDATE: So that I'm more clear - I'm using filegetcontents() to get the html from a site. For my purposes, I'd like to be able to sort through the html, find the option values, and output them. In this case, return 'Test - 123', 'Test - 456', etc.

A: 

Using strip_tags unless I'm misunderstanding the question.

    $string = '<option value="abc" >Test - 123</option>
    <option value="def" >Test - 456</option>
    <option value="ghi" >Test - 789</option>';

    $string = strip_tags($string);

Update: Missed that you loosely specify an array in your question. In this case, and I'm sure there's a cleaner method, I'd do something like:

$teststring = '<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>';

$stringarray = split("\n", strip_tags($teststring));
print_r($stringarray);

Update 2: And just to top and tail it, to present it as you originally asked (not an array as we may have been misled to believe, try the following:

$teststring = '<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>';

$stringarray = split("\n", strip_tags($teststring));

$newstring = join($stringarray, "','");
echo "'" . $newstring . "'\n";
Gav
+1  A: 

This code would load the values into an array, assuming you have line breaks in between the option tags like you showed:

// Load your HTML into a string.
$html = <<<EOF
<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>
EOF;

// Break the values into an array.
$vals = explode("\n", strip_tags($html));
James Skidmore
+3  A: 

There are many ways, which one is the best depends on more details than you've provided in your question.
One possibility: DOMDocument and DOMXPath

<?php
$doc = new DOMDocument;
$doc->loadhtml('<html><head><title>???</title></head><body>
  <form method="post" action="?" id="form1">
      <div>
        <select name="foo">
        <option value="abc" >Test - 123</option>
        <option value="def" >Test - 456</option>
        <option value="ghi" >Test - 789</option>
      </select>
    </div>
  </form>
</body></html>');

$xpath = new DOMXPath($doc);
foreach( $xpath->query('//form[@id="form1"]//option') as $o) {
    echo 'option text: ', $o->nodeValue, "  \n";
}

prints

option text: Test - 123  
option text: Test - 456  
option text: Test - 789
VolkerK
+1  A: 

If you’ve not just a fracture like the one mentioned, use a real parser like DOMDocument that you can walk through with DOMXPath.

Otherwise try this regular expression together with preg_match_all:

<option(?:[^>"']+|"[^"]*"|'[^']*')*>([^<]+)</option>
Gumbo
A: 

http://networking.ringofsaturn.com/Web/removetags.php

preg_match_all("s/<[a-zA-Z\/][^>]*>//g", $data, $out);
Bassel Safadi
This may be a valid pattern for sed but not for php's preg_match_all.
VolkerK
A: 

If we're doing regex stuff, I like this perl-like syntax:

$test = "<option value=\"abc\" >Test - 123</option>\n" .
    "<option value=\"abc\" >Test - 456</option>\n" .
    "<option value=\"abc\" >Test - 789</option>\n"; 

for ($offset=0; preg_match("/<option[^>]*>([^<]+)/",$test, $matches, 
                        PREG_OFFSET_CAPTURE, $offset); $offset=$matches[1][1])
   print($matches[1][0] . "\n");'
Guss
the value attribute of an option element is defined as CDATA. If I'm not mistaken that allows <option value=">abc " in html 4.01 (validator.w3.org agrees). Your code then prints 'abc" >Test - 123'.
VolkerK
Yes, it does :-) With regular expressions its easy to write something simple that handles common use cases (and also east to read), but its very hard to write something that parses a structured language like XML correctly. If you need strict "handles anything you throw at it" parser, use something that understands the language like DOM or SAX. The downside is that for simple cases DOM and SAX are harder to write and harder to read.
Guss