tags:

views:

346

answers:

2

Hi,

I'd like to parse simple C++ typedef instructions such as

typedef Class NewNameForClass;
typedef Class::InsideTypedef NewNameForTypedef;
typedef TemplateClass<Arg1,Arg2> AliasForObject;

I have written the corresponding grammar that i'd like to see used in parsing.

Name <- ('_'|letter)('_'|letter|digit)*
Type <- Name
Type <- Type::Name
Type <- Name Templates
Templates <- '<' Type (',' Type)* '>'
Instruction <- "typedef" Type Name ';'

Once this is parsed, all i'll want to do is to generate xml with the same information (but layed out differently)

What is the most effective language for writing such a program ? How can you achieve this ?

EDIT : What i have come up with using Boost Spirit (it's not perfect, but it's good enough for me, at least for now)

   rule<> sep_p = space_p;
   rule<> name_p =  (ch_p('_')|alpha_p) >> *(ch_p('_')|alpha_p|digit_p);
   rule<> type_p = name_p
           >> !(*sep_p >>str_p("::") >> *sep_p>> name_p)
           >>  *(*sep_p >> ch_p('*') )
           >> !(*sep_p >> str_p("const"))
           >> !(*sep_p >> ch_p('&'));
   rule<> templated_type_p = name_p >> *sep_p
           >> ch_p('<') >> *sep_p
           >> (*sep_p>>type_p>>*sep_p)%ch_p(',')
           >> ch_p('>') >> *sep_p;

   rule<> typedef_p = *sep_p
                   >> str_p ("typedef")
                   >> +sep_p >> (type_p|templated_type_p)
                   >> +sep_p >> name_p
                   >> *sep_p >> ch_p(';')  >> *sep_p;
   rule<> typedef_list_p = *typedef_p;
+1  A: 

Well, since you're apparently already working with/on C++, have you considered using Boost.Spirit? This allows you to hard-code the grammar inline in C++ as a domain-specific language and program against it in normal C++ code.

Konrad Rudolph
Yes I have, but i feel it's a bit complicated (even though i am very familiar with boost libraries) and will take some time to master. I hope that there are more user-friendly ways to do what i need...
Benoît
+1  A: 

I would alter the grammar slightly

ShortName <- ('_'|letter)('_'|letter|digit)*
Name <- ShortName
Name <- Name::ShortName
Type <- Name
Type <- Name Templates
Templates <- '<' Type (',' Type)* '>'
Instruction <- "typedef" Type Name ';'

Also your grammar leaves out the following cases

  1. Multiple typedef targets.
  2. Pointer targets
  3. Function pointers (this is by far the most difficult)

Parsing a grammar (i love the irony) is a fairly straight forward operation. If you wanted to actually use the grammar in a functional way, I would say the best bet is a lex/yacc combination.

But from your question it appears that you want to spit it out to another format. There really isn't a language designed for this so I would say use whatever language you're most comfortable with.

Edit

The OP asked about multiple typedef targets. It's perfectly legally for a typedef declaration to have more than 1 target. For Example:

typedef _SomeStruct SomeStruct, *PSomeStruct

This creates 2 typedef names.

  1. SomeStruct which is equivalent to "struct _SomeStruct"
  2. PSomeStruct which is equivalent to "struct _SomeStruct*"
JaredPar
Thanks, you're absolutely right, i had not taken namespaces into account. I'll also have to update the grammar to add references and pointers. I do not care about function pointers, so that should be a relief ! :-)I don't know what multiple typedef targets are.
Benoît
I added a an example for multiple typedef targets
JaredPar
Thanks for the example. I can live without them as well :-)
Benoît