For Python, the situation is trivial: there is a Python parser in the standard library as well as a more high-level module for manipulating ASTs.
Also, Python has a somewhat simple grammar (at least if you use the trick to keep an indentation stack in your lexer and inject fake BEGIN
and END
tokens in your token stream, so that you can treat Python as a simple keyword delimited Algol-like language in your parser), so it is often used as an example grammar for parser generators, which means that you can find literally dozens of Python parsers for pretty much every single parser generator, programming language and platform out there. (E.g., here is a Haskell module implementing a Python lexer and parser.)
For Ruby, there are quite a number of parsers available.
Ruby is incredibly hard to parse, so if you need full fidelity, you pretty much have to use the original YACC grammar file from the YARV Ruby implementation. (parse.y
in the top-level source directory.) JRuby's parser is derived from that file, and it is the only one of the implementation parsers that has been explicitly designed to also be used by other clients and not just the interpreter itself. (For example, the Eclipse RDT plugin, the Eclipse DLTK/Ruby plugin, the NetBeans Ruby plugin and the jEdit Ruby syntax highlighting all use JRuby's parser.) To facilitate that, JRuby's parser has actually been repackaged as a separate project.
Of course, there are YACC clones for pretty much every language on the planet. However, be aware that YARV does not use a lex
generated scanner. It uses a hand-written scanner in C, and also the YACC grammar contains quite a bit of semantic actions in C. Those parts will have to be re-implemented (like they were in JRuby).
The XRuby compiler is the only full Ruby implementation that does not use YARV's parse.y
, it uses an ANTLRv3 grammar and an ANTLRv3 tree grammar that have been developed from scratch. ANTLR can generate parsers for a whole bunch of languages, including for example Java and C#. Its Ruby backend, however, is in dire need of some work.
RedParse is a Ruby parser written in Ruby, which claims to be able to parse all Ruby syntax correctly. It is used, for example, in the YARD Ruby documentation tool to, among other things, extract method names.
ruby_parser is another Ruby parser in Ruby. It is generated from parse.y
via the racc
parser generator that is part of Ruby's standard library.
YARV actually contains a parser library called ripper
, which allows you to parse Ruby code. Unfortunately, it is completely undocumented, so you basically have to figure it out by reading blog posts. Except of course, being undocumented, almost nobody else has figured it out yet, either and written a blog post.
However, for your purposes, you don't actually need a full-blown Ruby parser. You only need enough to extract method names and some other stuff.
RDoc, the Ruby documentation generator, contains a Ruby parser which can parse just enough Ruby to, well, extract method names and some other stuff.
Cardinal is a Ruby implementation for the Parrot Virtual Machine. It does not yet run all of Ruby, but its parser should be powerful enough to support all you need. (The parser is written in the Parrot Grammar Engine, so you will obviously have to run it in Parrot, by for example writing your reporting tool in Perl6.)
tinyrb is another Ruby implementation that does not run full Ruby but contains a better written parser than YARV. In this case, the parser uses Ian Piumarta's leg
Parsing Expression Grammar parser generator.