tags:

views:

109

answers:

3

From an external source I'm getting strings like

array(1,2,3)

but also a larger arrays like

array("a", "b", "c", array("1", "2", array("A", "B")), array("3", "4"), "d")

I need them to be an actual array in php. I know I could use eval but since it are untrusted sources I'd rather not do that. I also have no control of the external sources.

Should I use some regular expressions for this (if so, what) or is there some other way?

+1  A: 

I think you should use the Tokenizer for this. Maybe I will write a script lateron, that actually does it.

nikic
Let me look into Tokenizer
Nin
+3  A: 

You could do:

json_decode(str_replace(array('array(', ')'), array('[', ']'), $string)));

Replace the array with square brackets. Then json_decode. If the string is just a multidimensional array with scalar values in it, then doing the str_replace will not break anything and you can json_decode it. If it contains any code, it will also replace the function brackets and then the Json won't be valid and NULL is returned.

Granted, that's a rather, umm, creative approach, but might work for you.

Edit: Also, see the comments for some further limitiations pointed out by other users.

Gordon
`array("array(", ")")`
KennyTM
I can't test this right now, but that is an elegant solution if it returns properly. +1
Tim
@KennyTM yeah that wouldnt work. I'll leave it up there nonetheless, so the OP can decide if it's of any use
Gordon
And this is for arrays only. Won't work on associative arrays.
nikic
It's creative, but I like it and it might just do the trick. It will be pretty simple arrays anyway.
Nin
+4  A: 

Whilst writing a parser using the Tokenizer which turned out not as easy as I expected, I came up with another idea: Why not parse the array using eval, but first validate that it contains nothing harmful?

So, what the code does: It checks the tokens of the array against some allowed tokens and chars and then executes eval. I do hope I included all possible harmless tokens, if not, simply add them. (I intentionally didn't include HEREDOC and NOWDOC, because I think they are unlikely to be used.)

function parseArray($code) {
    $allowedTokens = array(
        T_ARRAY                    => true,
        T_CONSTANT_ENCAPSED_STRING => true,
        T_LNUMBER                  => true,
        T_DNUMBER                  => true,
        T_DOUBLE_ARROW             => true,
        T_WHITESPACE               => true,
    );
    $allowedChars = array(
        '('                        => true,
        ')'                        => true,
        ','                        => true,
    );

    $tokens = token_get_all('<?php '.$code);
    array_shift($tokens); // remove opening php tag

    foreach ($tokens as $token) {
        // char token
        if (is_string($token)) {
            if (!isset($allowedChars[$token])) {
                throw new Exception('Disallowed token \''.$token.'\' encountered.');
            }
            continue;
        }

        // array token

        // true, false and null are okay, too
        if ($token[0] == T_STRING && ($token[1] == 'true' || $token[1] == 'false' || $token[1] == 'null')) {
            continue;
        }

        if (!isset($allowedTokens[$token[0]])) {
            throw new Exception('Disallowed token \''.token_name($token[0]).'\' encountered.');
        }
    }

    // fetch error messages
    ob_start();
    if (false === eval('$returnArray = '.$code.';')) {
        throw new Exception('Array couldn\'t be eval()\'d: '.ob_get_clean());
    }
    else {
        ob_end_clean();
        return $returnArray;
    }
}

var_dump(parseArray('array("a", "b", "c", array("1", "2", array("A", "B")), array("3", "4"), "d")'));

I think this is a good comprimise between security and convenience - no need to parse yourself.

For example

parseArray('exec("haha -i -thought -i -was -smart")');

would throw exception:

Disallowed token 'T_STRING' encountered.
nikic
I was having the same thought :) I haven't given up on the idea of making it entirly with the tokeniser though, but I'll explore your script first., thanks
Nin