Hi,
What about something like this :
$html = <<<HTML
<option value="TTO">1031</option><option value="187">187</option>
<option value="TWO">2SK8</option><option value="411">411</option>
<option value="AEL">Abec 11</option><option value="ABE">Abec11</option>
<option value="ACE">Ace</option><option value="ADD">Addikt</option>
<option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option>
<option value="ALG">Alligator</option><option value="ALM">Almost</option>
HTML;
$matches = array();
if (preg_match_all('#<option\s+value="([^"]+)">([^<]+)</option>#', $html, $matches)) {
$list = array();
$num_matches = count($matches[0]);
for ($i=0 ; $i<$num_matches ; $i++) {
$list[$matches[1][$i]] = $matches[2][$i];
}
var_dump($list);
}
The output ($list
) would be :
array
'TTO' => string '1031' (length=4)
187 => string '187' (length=3)
'TWO' => string '2SK8' (length=4)
411 => string '411' (length=3)
'AEL' => string 'Abec 11' (length=7)
'ABE' => string 'Abec11' (length=6)
'ACE' => string 'Ace' (length=3)
'ADD' => string 'Addikt' (length=6)
'AFF' => string 'Affiliate' (length=9)
'ALI' => string 'Alien Workshop' (length=14)
'ALG' => string 'Alligator' (length=9)
'ALM' => string 'Almost' (length=6)
A few explainations :
- I'm using
preg_match_all
to match as many times as possible
([^"]+)
means "everything that is not a double-quote (as that one would mark the end of the value
), at least one time, and as many times as possible (+
)
([^<]+)
means about the same thing, but with <
instead of "
as end marker
preg_match_all
will get me an array containing in $matches[1]
the list of all stuff that matched the first set of ()
, and in $matches[2]
what matched the second set of ()
- so I need to iterate over the results to re-construct the list that inetrestes you :-)
Hope this helps -- and that you understood what it does and how, so you can help yourself, the next time ;-)
As a sidenote : using regex to "parse" HTML is generally not such a good idea... If you have a full HTML page, you might want to take a look at DOMDocument::loadHTML
.
If you don't and the format of the options is not well-defined... Well, maybe it might prove useful to add some stuff to the regex, as a precaution... (Like accepting spaces here and there, accepting other attributes, ...)