views:

109

answers:

6

Ok this is really difficult to explain in English, so I'll just give an example.

I am going to have strings in the following format:

key-value;key1-value;key2-...

and I need to extract the data to be an array

array('key'=>'value','key1'=>'value1', ... )

I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:

/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/

to work with preg_match and this code:

for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
    $parameters[$matches[$i]] = $matches[$i+1];
}

However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.

In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.

A: 

No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.

Ignacio Vazquez-Abrams
+2  A: 

regex is powerful tool, but sometimes, its not the best approach.

$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
    $e = explode("-",$k);
    $array[$e[0]]=$e[1];
}
print_r($array);
ghostdog74
Thanks, but as I said in OP: `I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results`
Raveren
So... you know this works but you don't want to use it because you would rather use a regex even though regex isn't the right tool for the job. Is the regex a real requirement, or are you just trying to do it that way because it should work?
David
Read the OP, or should I quote it once more? It's only a couple of paragraphs long, if it is too hard for you to read it, please refrain from smart ass comments.
Raveren
+2  A: 

Use preg_match_all() instead. Maybe something like:

$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';

preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
   $parameters[$match[1]] = $match[2];
}

print_r($parameters);

EDIT:

to first validate if the input string conforms to the pattern, then just use:

if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
    /* do the preg_match_all stuff */
}       

EDIT2: the final semicolon is optional

if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
    /* do the preg_match_all stuff */
}       
Lukman
As this is the only answer in vein with my question I give you a `+` and will accept it if no one proposes a better sollution, but the regexp does not verify the string given (`'foo-bar-baz'` would be treated as valid values)
Raveren
so for `'foo-bar-baz'` you want `'foo' => 'bar-baz'` or `'foo-bar' => 'baz'` ? i can easily give you the alternate regex ;)
Lukman
for `'foo' => 'bar-baz'` use regex `/(\w+)-([^;]+)/` instead
Lukman
No, I'd rather it'd fail altogether and not give any results, in my understanding the only such regex is mine (even though it can't be used in fetching data, only validating it). I'm sorry I did not mention validation of input in my OP, but the main answer seems clear. I'm going to wait one day before accepting your answer though.
Raveren
answer updated.
Lukman
Again, the leading semicolon `;` is (highly) optional, so still my regex stands :)
Raveren
answer updated again, o your fussiness ... >_____<
Lukman
That's really mature of you to bad mouth me in your public answer whilst bringing NOTHING new to the discussion - only proving me right.
Raveren
feeling a bit touchy, aren't you? fine, i've removed the 'bad-mouthing' part so that you can be prouder of your case .. thanks for the very clear question.
Lukman
A: 

what about this solution:

$samples = array(
    "good" => "key-value;key1-value;key2-value;key5-value;key-value;",
    "bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
    "bad2" => "key;key1-value;key2-value;key5-value;key-value;",
    "bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);

foreach($samples as $name => $value) {
    if (preg_match("/^(\w+-\w+;)+$/", $value)) {
        printf("'%s' matches\n", $name);
    } else {
        printf("'%s' not matches\n", $name);
    }
}
KARASZI István
The final semicolon `;` is not required. Also this only verifies the input, I'd like the regex to verify AND create an array.
Raveren
yeah, I haven't realized that you need the data too.
KARASZI István
A: 

I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

kemp
+2  A: 

You can use a lookahead to validate the input while you extract the matches:

/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/

(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.

This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.

If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.

Alan Moore