




I have a source file with a select form with some options, like this:

<option value="TTO">1031</option><option value="187">187</option><option value="TWO">2SK8</option><option value="411">411</option><option value="AEL">Abec 11</option><option value="ABE">Abec11</option><option value="ACE">Ace</option><option value="ADD">Addikt</option><option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option><option value="ALG">Alligator</option><option value="ALM">Almost</option>

I would like to read this file using php and regex, but I don't really know how. Anybody an idea? It would be nice to have an array with the 3 digits code as a key, and the longer string as a value. (so, for example, $arr['TWO'] == '2SK8')

+2  A: 


What about something like this :

$html = <<<HTML
<option value="TTO">1031</option><option value="187">187</option>
<option value="TWO">2SK8</option><option value="411">411</option>
<option value="AEL">Abec 11</option><option value="ABE">Abec11</option>
<option value="ACE">Ace</option><option value="ADD">Addikt</option>
<option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option>
<option value="ALG">Alligator</option><option value="ALM">Almost</option>

$matches = array();
if (preg_match_all('#<option\s+value="([^"]+)">([^<]+)</option>#', $html, $matches)) {
    $list = array();

    $num_matches = count($matches[0]);
    for ($i=0 ; $i<$num_matches ; $i++) {
        $list[$matches[1][$i]] = $matches[2][$i];


The output ($list) would be :

  'TTO' => string '1031' (length=4)
  187 => string '187' (length=3)
  'TWO' => string '2SK8' (length=4)
  411 => string '411' (length=3)
  'AEL' => string 'Abec 11' (length=7)
  'ABE' => string 'Abec11' (length=6)
  'ACE' => string 'Ace' (length=3)
  'ADD' => string 'Addikt' (length=6)
  'AFF' => string 'Affiliate' (length=9)
  'ALI' => string 'Alien Workshop' (length=14)
  'ALG' => string 'Alligator' (length=9)
  'ALM' => string 'Almost' (length=6)

A few explainations :

  • I'm using preg_match_all to match as many times as possible
  • ([^"]+) means "everything that is not a double-quote (as that one would mark the end of the value), at least one time, and as many times as possible (+)
  • ([^<]+) means about the same thing, but with < instead of " as end marker
  • preg_match_all will get me an array containing in $matches[1] the list of all stuff that matched the first set of (), and in $matches[2] what matched the second set of ()
    • so I need to iterate over the results to re-construct the list that inetrestes you :-)

Hope this helps -- and that you understood what it does and how, so you can help yourself, the next time ;-)

As a sidenote : using regex to "parse" HTML is generally not such a good idea... If you have a full HTML page, you might want to take a look at DOMDocument::loadHTML.
If you don't and the format of the options is not well-defined... Well, maybe it might prove useful to add some stuff to the regex, as a precaution... (Like accepting spaces here and there, accepting other attributes, ...)

That sounds about right, except you get a much sweeter deal out of array_combine(): $list = array_combine($matches[1], $matches[2]);
Josh Davis
Ergh, I never think about those ones :-( Thanks for the tip !
+2  A: 

Try this out. Just load the file's contents into $raw_html and use this regex to collect the matches. The 3-digit code from the $ith option is $out[i][1], and the longer string is $out[i][2]. You can convert that to an associative array as needed.

$regex = '|<option value="(.{3})">([^<]+)</option>|';
preg_match_all($regex, $raw_html, $out, PREG_SET_ORDER);

This does not seem to work...Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '[' in ...
Fixed, thanks. I forgot the regex delimiters ;)
+1  A: 
$options= '
<option value="TTO">1031</option><option value="187">187</option><option value="TWO">2SK8</option><option value="411">411</option><option value="AEL">Abec 11</option><option value="ABE">Abec11</option><option value="ACE">Ace</option><option value="ADD">Addikt</option><option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option><option value="ALG">Alligator</option><option value="ALM">Almost</option>
preg_match_all( '@(<option value="([^"]+)">([^<]+)<\/option>)@', $options, $arr);

$result = array();
foreach ($arr[0] as $i => $value)
    $result[$arr[2][$i]] = $arr[3][$i];


    [TTO] => 1031
    [187] => 187
    [TWO] => 2SK8
    [411] => 411
    [AEL] => Abec 11
    [ABE] => Abec11
    [ACE] => Ace
    [ADD] => Addikt
    [AFF] => Affiliate
    [ALI] => Alien Workshop
    [ALG] => Alligator
    [ALM] => Almost