views:

260

answers:

3

I have written a epytext to reST markup converter, and now I want to convert all the docstrings in my entire library from epytext to reST format.

Is there a smart way to read the all the docstrings in a module and write back the replacements?

ps: ast module perhaps?

A: 

Probably the most straightforward just to do it the old-fashioned way. Here's some initial code to get you going. It probably could be prettier but should give the basic idea:

def is_docstr_bound(line):
    return "'''" in line or  '"""' in line

# XXX: output using the same name to some other folder
output = open('output.py', 'w')

docstr_found = False
docstr = list()
with open('input.py') as f:
    for line in f.readlines():
        if docstr_found:
            if is_docstr_bound(line):
                # XXX: do conversion now
                # ...

                # and write to output
                output.write(''.join(docstr))

                output.write(line)

                docstr = list()
                docstr_found = False
            else:
                docstr.append(line)
        else:
            if is_docstr_bound(line):
                docstr_found = True

            output.write(line)

output.close()

To make it truly functional you need to hook it up with a file finder and output the files to some other directory. Check out the os.path module for reference.

I know the docstring bound check is potentially really weak. It's probably a good idea to beef it up a bit (strip line and check if it begins or ends with a docstring bound).

Hopefully that gives some idea how to possibly proceed. Perhaps there's a more elegant way to handle the problem. :)

bebraw
Walking trough my directory structure and opening/reading/writing files is trivial. My question is: Is there a smart way to read the all the docstrings in a module and write back the replacements?This cannot be done naively with mechanisms like regular expressions (like re.finditer('\"\"\"(.*)\"\"\"', source)), because I don't want to mess up the rest of code.
tomaz
I found a similar question that you might find interesting. See http://stackoverflow.com/questions/768634/python-parse-a-py-file-read-the-ast-modify-it-then-write-back-the-modified .
bebraw
Docstrings are not required to have triple-quoted strings, and not everything quoted with a triple-quoted string is a docstring, so this only works for a subset of python docstrings.
jcdyer
A: 

I wonder about a combination of introspection and source processing. Here's some untested pseudocode:

import foo #where foo is your module

with open('foo.py',r) as f:
    src = f.readlines()

for pything in dir(foo):  #probably better ways to do this...
    try:
       docstring = pything.__doc__
    except AttributeError:
       #no docstring here
       pass

    #modify the docstring
    new_docstring = my_format_changer(docstring)

    #now replace it in the source
    src = src.replace(docstring, new_docstring)

#When done, write it out
with open('new_foo.py','w') as fout:
    fout.write(src)

Clearly you'd have to put some cleverness in the code that traverses the module looking for objects that have docstrings so it would recurse, but this gives you the general idea.

Vicki Laidler
+1  A: 

It might be an overkill for this simple usage, but I'd look into using the machinery of 2to3 to do the editing. You just need to write a custom fixer. It's not well-documented, but Developer's Guide to Python 3.0: Python 2.6 and Migrating From 2 to 3: More about 2to3 and Implement Custom Fixers gives enough detail to get started...

Epydoc seems to contain a to_rst() method which might help you actually translate the docstrings. Don't know if it's any good...

Beni Cherniavsky-Paskin