views:

91

answers:

4

I'm trying to learn about parsers, for Python, C and C++ source (on my own, not for a school project). Here is a summary of what i want to do: 1) read .c/.cpp/.py source files in Python 2) get a list of all the functions in the source files, and the span of their definitions in terms of line numbers.

So to illustrate my question, consider the following code in a file "helloWorld.cpp" (read this in python):

//start 
#include <iostream>
#include <string>
using namespace std; 

int main(int argc, char** argv)
{
  string str = "Hello World";
  cout << str << endl; 

  return 0;
}
//end 

What i want to get is something along: list of functions: int main(int argc, char** argv) start: line 7 end: line 12

Any ideas on how to achieve this (some code examples would be greatly appreciated)?

A: 

Pygments might be a good place to start. It is a generic code highlighter written in python with all the languages and lots more that you were trying to parse. You can find it here: http://dev.pocoo.org/projects/pygments/wiki

data
+1  A: 

If you're really interested in learning about parsing C, you might want to look into pycparsing. It's built on PLY, so you can probably leverage what you learn from it for parsing lots of things.

Parsing C++, though, is way more complicated than parsing C or Python, so you may want to explore Python and C before you start digging into C++.

Hank Gay
A: 

It is possible to implement python bindings to Clang, or, alternatively, you could just parse and analyse XML AST dumps from Clang with Python.

SK-logic
A: 

For C and especially C++ - if you have a real-world project, I would recommend staying as close to a canonical parser implementation as possible. C++ parsing is not for the light-hearted (and usually not done right - even by commercial compilers). I have used gcc-xml in the past just for this reason. It uses gcc to parse the code and then translates gcc's internal representation to a referential XML representation of the code that is a little easier to grok. It may not teach you about parsing, but it will give you some insight into the language grammar in a familiar XML data model.

For Python code you can make use of the parser and/or ast modules. I have never personally used them myself however.

Jeremy Brown