views:

369

answers:

3

How to parse in easy way a .h file written in C for comments and entity names using Python?

We're suppose for a further writing the content into the word file already developed.

Source comments are formatted using a simple tag-style rules. Comment tags used for an easy distinguishing one entity comment from the other and non-documenting comments. A comment could be in multi-line form. An each comment have stay straight upon the entity definition:

//ENUM My comment bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
//     could be multi-line. Bla bla bla bla bla bla bla bla bla.
enum my_enum
{
    //EITEM My enum item 1.
    //      Just could be multi-line too.
    MY_ENUM_ITEM_1,

    //EITEM My enum item 2
    MY_ENUM_ITEM_2,
};

//STRUCT My struct
struct my_struct {

    //MEMBER struct member 1
    int m_1_;
};

//FUNC my function 1 description.
//     Could be multi-line also.
//INPUT  arg1 - first argument
//RETURN pointer to an allocated my_struct instance.
my_struct* func_1(int arg1);

A code-and-comments tree should come out as a result of this parsing.

How does one make it quickly and without using third-party libraries?

+1  A: 

Perhaps shlex module would do?

If not, there are some more powerful alternatives: http://wiki.python.org/moin/LanguageParsing

vartec
+4  A: 

This has already been done. Several times over.

Here is a parser for the C language written in Python. Start with this.

http://wiki.python.org/moin/SeeGramWrap

Other parsers.

http://wiki.python.org/moin/LanguageParsing

http://nedbatchelder.com/text/python-parsers.html

You could probably download any ANSI C Yacc grammar and rework it into PLY format without too much trouble and use that as a jumping-off point.

S.Lott
Another one: http://code.google.com/p/pycparser/. I don't think “quickly and without third-party libraries” is achievable: C has a complex grammar which is not amenable to parsing through naive methods like string matching and regex.
bobince
I don't understand the "without using third-party libraries" consideration. It's all open source; what 3rd-party issues, concerns or problems are they concerned about?
S.Lott
A: 

Here's a quick and dirty solution. It won't handle comments in strings, but since this is just for header files that shouldn't be an issue.

S_CODE,S_INLINE,S_MULTLINE = range (3)
f = open (sys.argv[1])
state = S_CODE
comments = ''
i = iter (lambda: f.read (1), '')
while True:
    try:
     c = i.next ()
    except StopIteration:
     break
    if state == S_CODE:
     if c == '/':
      c = i.next ()
      if c == '*':
       state = S_MULTLINE
      elif c == '/':
       state = S_INLINE
    elif state == S_INLINE:
     comments += c
     if c == '\n':
      state == S_CODE
    elif state == S_MULTLINE:
     if c == '*':
      c = i.next ()
      if c == '/':
       comments += '\n'
       state = S_CODE
      else:
       comments += '*%s' % c
     else:
      comments += c
print comments
eduffy
Just curious: did you write this just now?
Vulcan Eager
yeah .. had some time to kill. Just took 5 minutes.
eduffy