views:

127

answers:

2

I'd like to be able to parse the following structure:

blah
{
    "string-1",
    "string-2",
    ...,
    "string-n"
}

I'm using flex to tokenize, and that's working perfectly. I'm using yacc (bison) for the parsing.

What's the recommended way to allow this structure? Right now, in my test.y file, I've got:

blah_command:
    BLAH OPEN_BRACE string_list CLOSE_BRACE
    {
        printf( "String list is %s\n", $3 );
    }

string_list: /* empty */
    |
    STRING
    {
        return $1;
    }
    |
    STRING COMMA string_list
    {
        strcat($1, ",");
        strcat($1, $3);
    }

I suspect the strcat() is a really, really bad idea. I'm a real novice when it comes to lex/yacc (about 3 hours experience) so a smack on the wrist and a pointer in the right direction would be great.

EDIT: The goal of this is to allow me to build a test harness for an external application. The lexing/parsing will be used to interpret a test script that the user provides. One command allows the user to send a message to the application, then I read the multi-line response and compare with the variable length list of strings the user has provided in the script. The fragment I've posted above is the way I figured I'd let the user define the possible response.

For example:

blah
{
    "COMMAND EXECUTED CORRECTLY"
}

or

blah
{
    "QUERY COMPLETE IN .0034 SECONDS",
    "1 RECORD FOUND:",
    "FOO=12345",
    "--END OF LIST--"
}
A: 

In your example, you are simply outputting the input, so strcat is fine.

Typically, what one tries to do is to build up an abstract syntax tree. In the case of an AST, you can either create a node structure, or map the contents into an array.

If you give more detail on the goal of your program, I can give you a more detailed answer.

brianegge
What I don't know is what $1 actually represents (ie, memory wise) so I have no idea what I'm strcat'ing to.
Andrew
In "STRING COMMA string_list", $1 refers to STRING. $2 refers to COMMA. $3 refers to string_list.
Fragsworth
Sure; I understand that. But when I strcat(), I'm modifying memory. Which memory, and where? Is this a buffer overflow in the making?
Andrew
+1  A: 

If all you are doing is printing this, strcat() works fine to connect all the strings together.

Normally, however, your parser will be building an abstract syntax tree. So instead of outputting the string, you would have something like the following:

Node* n = new_node(STRING_LIST_NODE); // STRING_LIST_NODE being an enum node type
n->value = $1
n->next = $3
$$ = n;
Fragsworth
This makes sense. I'll pursue this...
Andrew