tags:

views:

361

answers:

3

I got a question... I got code like this, and I want to read it with PHP.

 NAME
 {
    title
    (
     A_STRING
    );

    settings
    {
     SetA( 15, 15 );
     SetB( "test" );
    }

    desc
    {
     Desc
     (
      A_STRING
     );

     Cond
     (
      A_STRING
     ); 

    }
 }

I want:

$arr['NAME']['title'] = "A_STRING";
$arr['NAME']['settings']['SetA'] = "15, 15";
$arr['NAME']['settings']['SetB'] = "test";
$arr['NAME']['desc']['Desc'] = "A_STRING";
$arr['NAME']['desc']['Cond'] = "A_STRING";

I don't know how I should start :/. The variables aren't always the same. Can someone give me a hint on how to parse such a file?

Thx

+5  A: 

This looks like a real grammar - you should use a parser generator. This discussion should get you started.

There are a few options already made for php: a lexer generator module and this is a parser generator module.

Shane C. Mason
Since the grammar doesn't seem very complex a hand-written recursive-descent parser might suffice as well. That could avoid using a parser generator.
Joey
Meh - perhaps - but I think every 'programmer' ought to have to get their hands dirty with a generator at least once.
Shane C. Mason
+2  A: 

It's not an answer but suggestion:

Maybe you can modify your input code to be compatible with JSON which has similar syntax. JSON parsers and generators are available for PHP.

http://www.json.org/

http://www.php.net/json

mateusza
A: 

If the files are this simple, then rolling your own homegrown parser is probably a lot easier. You'll eventually end up writing regex with lexers anyway. Here's a quick hack example: (in.txt should contain the input you provided above.)

<pre>
<?php

$input_str = file_get_contents("in.txt");
print_r(parse_lualike($input_str));

function parse_lualike($str){    
    $str = preg_replace('/[\n]|[;]/','',$str);
    preg_match_all('/[a-zA-Z][a-zA-Z0-9_]*|[(]\s*([^)]*)\s*[)]|[{]|[}]/', $str, $matches);
    $tree = array();
    $stack = array();
    $pos = 0;
    $stack[$pos] = &$tree;
    foreach($matches[0] as $index => $token){
        if($token == '{'){
            $node = &$stack[$pos];
            $node[$ident] = array();
            $pos++;
            $stack[$pos] =  &$node[$ident];
        }elseif($token=='}'){
            unset($stack[$pos]);
            $pos--;
        }elseif($token[0] == '('){
            $stack[$pos][$ident] = $matches[1][$index];
        }else{
            $ident =  $token;
        }
    }
    return $tree;
}

?>

Quick explanation: The first preg_replace removes all newlines and semicolons, as they seem superfluous. The next part divides the input string into different 'tokens'; names, brackets and stuff inbetween paranthesis. Do a print_r $matches; there to see what it does.

Then there's just a really hackish state machine (or read for-loop) that goes through the tokens and adds them to a tree. It also has a stack to be able to build nested trees.

Please note that this algorithm is in no way tested. It will probably break when presented with "real life" input. For instance, a parenthesis inside a value will cause trouble. Also note that it doesn't remove quotes from strings. I'll leave all that to someone else...

But, as you requested, it's a start :)

Cheers!

PS. Here's the output of the code above, for convenience:

Array
(
    [NAME] => Array
        (
            [title] => A_STRING   
            [settings] => Array
                (
                    [SetA] => 15, 15 
                    [SetB] => "test" 
                )

            [desc] => Array
                (
                    [Desc] => A_STRING       
                    [Cond] => A_STRING       
                )

        )

)
0scar
Thank you! That really helps. Now I have a base I can code on :).