tags:

views:

182

answers:

5

I have never really thought about this until today, but after searching the web I didn't really find anything. Maybe I wasn't wording it right in the search.

Given an array (of multiple dimensions or not):

$data = array('this' => array('is' => 'the'), 'challenge' => array('for' => array('you')));

When var_dumped:

array(2) { ["this"]=> array(1) { ["is"]=> string(3) "the" } ["challenge"]=> array(1) { ["for"]=> array(1) { [0]=> string(3) "you" } } }

The challenge is this: What is the best optimized method for recompiling the array to a useable array for PHP? Like an undump_var() function. Whether the data is all on one line as output in a browser or whether it contains the line breaks as output to terminal.

Is it just a matter of regex? Or is there some other way? I am looking for creativity.

UPDATE: Note. I am familiar with serialize and unserialize folks. I am not looking for alternative solutions. This is a code challenge to see if it can be done in an optimized and creative way. So serialize and var_export are not solutions here. Nor are they the best answers.

+1  A: 

Use regexp to change array(.) { (.*) } to array($1) and eval the code, this is not so easy as written because You have to deal with matching brackets etc., just a clue on how to find solution ;)

  • this will be helpful if You cant change var_dump to var_export, or serialize
canni
A regexp solution is going to be very difficult because you can have nested braces... So it's more likely to involve a string parser than a regexp (considering you have state to worry about due to the nesting)...
ircmaxell
no You do not have to deal with string parser, regexp have some superb functions as ungreed/global flags etc, it can be done with one single regexp with correct setted flags :)
canni
the BBcode parsers are build on top of regexp, and work well without state machne ;)just consider 'array(.) {' and '}' as close/open tags :)
canni
Then show me a single regex that will convert all valid var_dumped data back into native parsable php... I'll admit I'm wrong if you can show me an example of a regex that can deal with: `array(1) { ["foo}[bar]"] => string(4) "baz{" }`
ircmaxell
You're probably right it can't be done by just one regexp, but still, You can use one regexp per "tag" where tag is one of: array(.) ; string(.) ; integer(.) etc.and parse output in correct order (simple types -> arrays)but still it is not possible to "reparse" var_dumped objects and other non-starndard structures, for this we have serialize and other stuff
canni
note that cdburgess is looking for a code challenge, so i'm putting some clues on how it can be achieved :)
canni
+1  A: 

I think you are looking for the serialize function:

serialize — Generates a storable representation of a value

It allows you to save the contents of array in readable format and later you can read the array back with unserialize function.

Using these functions, you can store/retrieve the arrays even in text/flat files as well as database.

Sarfraz
+10  A: 

var_export or serialize is what you're looking for. var_export will render a PHP parsable array syntax, and serialize will render a non-human readable but reversible "array to string" conversion...

Edit Alright, for the challenge:

Basically, I convert the output into a serialized string (and then unserialize it). I don't claim this to be perfect, but it appears to work on some pretty complex structures that I've tried...

function unvar_dump($str) {
    if (strpos($str, "\n") === false) {
        //Add new lines:
        $regex = array(
            '#(\\[.*?\\]=>)#',
            '#(string\\(|int\\(|float\\(|array\\(|NULL|object\\(|})#',
        );
        $str = preg_replace($regex, "\n\\1", $str);
        $str = trim($str);
    }
    $regex = array(
        '#^\\040*NULL\\040*$#m',
        '#^\\s*array\\((.*?)\\)\\s*{\\s*$#m',
        '#^\\s*string\\((.*?)\\)\\s*(.*?)$#m',
        '#^\\s*int\\((.*?)\\)\\s*$#m',
        '#^\\s*float\\((.*?)\\)\\s*$#m',
        '#^\\s*\[(\\d+)\\]\\s*=>\\s*$#m',
        '#\\s*?\\r?\\n\\s*#m',
    );
    $replace = array(
        'N',
        'a:\\1:{',
        's:\\1:\\2',
        'i:\\1',
        'd:\\1',
        'i:\\1',
        ';'
    );
    $serialized = preg_replace($regex, $replace, $str);
    $func = create_function(
        '$match', 
        'return "s:".strlen($match[1]).":\\"".$match[1]."\\"";'
    );
    $serialized = preg_replace_callback(
        '#\\s*\\["(.*?)"\\]\\s*=>#', 
        $func,
        $serialized
    );
    $func = create_function(
        '$match', 
        'return "O:".strlen($match[1]).":\\"".$match[1]."\\":".$match[2].":{";'
    );
    $serialized = preg_replace_callback(
        '#object\\((.*?)\\).*?\\((\\d+)\\)\\s*{\\s*;#', 
        $func, 
        $serialized
    );
    $serialized = preg_replace(
        array('#};#', '#{;#'), 
        array('}', '{'), 
        $serialized
    );

    return unserialize($serialized);
}

I tested it on a complex structure such as:

array(4) {
  ["foo"]=>
  string(8) "Foo"bar""
  [0]=>
  int(4)
  [5]=>
  float(43.2)
  ["af"]=>
  array(3) {
    [0]=>
    string(3) "123"
    [1]=>
    object(stdClass)#2 (2) {
      ["bar"]=>
      string(4) "bart"
      ["foo"]=>
      array(1) {
        [0]=>
        string(2) "re"
      }
    }
    [2]=>
    NULL
  }
}
ircmaxell
@Gordon you beat me to it. I was just going back to edit those links in. Thanks!
ircmaxell
I think you misunderstood the question. The challenge is to reverse the var_dump into an array. I am familiar with serialize() and unserialize()... and yes, they are by far better options. This is a code challenge. Maybe it's not worth the effort, but I wanted to see if it could be done in an optimized and creative way. I am not looking for an alternative solution.
cdburgess
@cdburgess: It is strange, what do you want to do exactly?
Sarfraz
The challenge is to take the output of var_dump and print out the rebuilt array. So going from `array(2) { ["this"]=> array(1) {...` back to `array('this' => array(`
cdburgess
@cdburgess: So the title of your question should be **Code Challenge - Convert var_dump back to array/variable**
Sarfraz
@cdburgess There you go, there's an attempt at a function to do just that...
ircmaxell
Looks great. However, When I paste your code into a file, it will not execute.
cdburgess
Are you on php 5.2? Because that code is written for 5.3+ (If you want to change it back, you'll need to change the `$foo = function` calls to create_function). I'll whip up the quick change and edit back in...
ircmaxell
And I just edited back in a far more robust version of the regexps that should account for strings with serialized tokens inside of them...
ircmaxell
PHP Notice: unserialize(): Error at offset 0 of 208 bytes in /home/y/share/htdocs/test.php on line 51 ... however, I am using a slightly different version of the var_dump. `$export = 'array(2) { ["this"]=> array(2) { ["is"]=> string(3) "the" [0]=> array(2) { [0]=> string(3) "one" [1]=> string(4) "only" } } ["challenge"]=> array(1) { ["for"]=> array(2) { [0]=> string(3) "you" [1]=> int(2) } } }';`
cdburgess
Are there new lines (like `var_dump` provides)? Or did you just make it into a single line string (which makes the parsing a lot harder to do as robust)...
ircmaxell
A single line... just as it shows in the question.
cdburgess
@cdburgess: But that's not how PHP outputs a var_dump. There are linebreaks in it. And my solution depends upon those linebreaks. try doing:`ob_start(); var_dump($var); $data = ob_get_clean();` and then calling my function with `$data`...
ircmaxell
If it is output to a webpage it does. But thanks for the clarification. I will update the question so it is more clear.
cdburgess
@cdburgess: Wrap it in `<pre>` tags. You'll see that there are new lines... Otherwise, it's not truly output of `var_dump` (Since its output includes new lines, and removing them changes the output)...
ircmaxell
@cdburgess: Ok, I added some support for the dump on a single line. Be aware that this may wind up changing the strings if they have any of the "tokens" inside of them (and hence break the serialized output)... It'll be robust if there are new lines, but if there are not it may stumble more...
ircmaxell
Very well done! You have definitely gone the extra mile. Congrats!
cdburgess
+2  A: 

If you want to encode/decode an array like this, you should either use var_export(), which generates output in PHP's array for, for instance:

array(
  1 => 'foo',
  2 => 'bar'
)

could be the result of it. You would have to use eval() to get the array back, though, and that is a potentially dangerous way (especially since eval() really executes PHP code, so a simple code injection could make hackers able to gain control over your PHP script).

Some even better solutions are serialize(), which creates a serialized version of any array or object; and json_encode(), which encodes any array or object with the JSON format (which is more preferred for data exchange between different languages).

Frxstrem
+1  A: 

The trick is to match by chunks of code and "strings", and on strings do nothing but otherwise do the replacements:

$out = preg_replace_callback('/"[^"]*"|[^"]+/','repl',$in);

function repl($m)
{
    return $m[0][0]=='"'?
        str_replace('"',"'",$m[0])
    :
        str_replace("(,","(",
            preg_replace("/(int\((\d+)\)|\s*|(string|)\(\d+\))/","\\2",
                strtr($m[0],"{}[]","(), ")
            )
        );
}

outputs:

array('this'=>array('is'=>'the'),'challenge'=>array('for'=>array(0=>'you')))

(removing ascending numeric keys starting at 0 takes a little extra accounting, which can be done in the repl function.)

ps. this doesn't solve the problem of strings containing ", but as it seems that var_dump doesn't escape string contents, there is no way to solve that reliably. (you could match \["[^"]*"\] but a string may contain "] as well)

mvds
This is great! You are one of the few who actually read and undertood the question. Thanks for taking the challenge and providing a working solution. Now what if there is an INT(5) as the value? (i.e. `array('you',2)`) It will be displayed as int(5) but should return from your function as 5.
cdburgess
I just took your example to make it work. Replacing `int\(\d+\)` with the number doesn't sound like much of a challenge. see updated answer.
mvds
Superb! Very well done and in small optimized code! FYI: There is a missing comma after "\\2".
cdburgess