views:

68

answers:

1

Hi everybody ! I'm wondering about formal languages. I have a kind of parser : It reads à xml-like serialized tree structure and turn it into a multidimmensionnal array.

My point is on the similarities between the algorithm being used and the differents kinds of automatons ( state machines turing machines stack ... ).

So the question is : which is the automaton I implictly use here, and to which formal languages family does it fit ? And what's about recursion ?

What i mean by " automaton i use implicitly " is "which is the minimal automaton to do the same job".

Here is the complete source :

$words; // an array of XML tag '<tag>', '</tag>' and simple text content

$tree = array( 'type' => 'root', 'sub' => array() );

$pTree = array(&$tree);

$deep = 0;

foreach ( $words as $elem ) {

if ( preg_match($openTag, $elem) ) { // $elem is an open tag

    $pTree[$deep++]['sub'][] = array( // we add an element to the multidim array
        'type' => 'block',
        'content' => $elem,
        'sub' => array()
    );

    $size = sizeof($pTree[$deep - 1]['sub']);
    $pTree[$deep] = &$pTree[$deep - 1]['sub'][$size - 1]; // down one level in the tree

} elseif ( preg_match($closeTag, $elem) ) { // it is a close tag

    $deep--; // up in the tree 

} else { // simple element

    $pTree[$deep]['sub'][] = array(
        'type' => 'simple',
        'content' => $elem
    );

}

}

+2  A: 

Please take a look at your question again. You're referring to a $words variable, which is not in your example. Also, there is no code, without knowing what is being done it's hard to answer you.

Judging from the name of the variable $deep, it is probably not the state. The state in an automaton is an element of a set that is specific to the automaton; $deep looks like it could contain a depth, any positive integer. Again, hard to tell without the code.

Anyway, you are probably not "implicitly using" any automaton at all, if you didn't design your code as an implementation of one.

Your simple xml-like files could probably be recognized by a deterministic stack machine, or generated by a deterministic context-free grammar, making them Type-2 in the Chomsky hierarchy. Once again this is just a guess, "a xml-like serialized tree structure" is too vague for any kind of formalism.

In short, if you are looking to use any formal theory, do word your questions more formally.


Edit (after seeing the code):

You're building a tree. That's out of reach for an automaton (at least the “standard” ones). Finite automatons only work with an input and a state, stack machines add a stack to that, and Turing machines have a read-write tape they can move in both directions.

The “output” of an automaton is a simple “Yes” (accepted) or “No” (not accepted, or an infinite loop). (Turing machines can be defined to provide more output on their tape.) The best I can answer to “which is the minimal automaton to do the same job” is that your language can be accepted by a stack machine; but it would work very differently and not give you trees.

However, you might look into grammars – another formal language construct that introduces the concept of parse trees. What you are doing here is creating such a parse tree with a top-down parser.

Petr Viktorin
Thanks for your answer, i added the code, and more details.And your guess is right.
dader51
Edited. (not sure if you get any kind of notification from just that, hence this comment)
Petr Viktorin
I guess that if the language is accepted, the stackmachine would only return me true, and not a tree. So, I think that it is possible to implement the tree parser as two objet working together, a StackMachine, and a kind of Tree Builder. We define each transition of the stack machine as a method call on the tree builder to which we pass the stack state, and eventually the symbol being read, current state, etc.Does it make sens to you ? Anyway, I'll take a look to grammar, and i give much interest to your answer, thx.
dader51
I guess it would be possible, but I don't see the point.Theoretical constructs like the automatons are usually used to do mathematical proofs, while practical programming languages like PHP are used to solve practical tasks (such as deserialization). By mixing them, you'd get rid of both mathematical purity and a straightforward implementation.What would you want to do? Many things are possible, but not all are helpful.
Petr Viktorin