tags:

views:

188

answers:

3

I'm trying to insert some import lines into a python source file, but i would ideally like to place them right after the initial docstring. Let's say I load the file into the lines variable like this:

lines = open('filename.py').readlines()

How to find the line number, where the docstring ends?

+1  A: 

If you're using the standard docstring format, you can do something like this:

count = 0
for line in lines:
    if line.startswith ('"""'):
        count += 1
        if count < 3:
            # Before or during end of the docstring
            continue
    # Line is after docstring

Might need some adaptation for files with no docstrings, but if your files are formatted consistently it should be easy enough.

John Millikin
Don't forget that a docstring is not just a triple-quoted string literal. A docstring is *any* string literal that is the first expression in a module, class or function. It can use """, ''', ", ', or even r""", r''', r', r", u""", u''', u", u', ur""", ur''', ur" or ur' for the opening quotes.
Thomas Wouters
That's what I mean by "standard docstring format" -- PEP 8 says to use triple-string literals.
John Millikin
There are some tricky variants even with "standard" docstrings. See PEP257's commends on unicode and raw string docstrings. There's also the common case of the one-liner, or where the """ isn't at the start of a line. Also consider docstrings with embedded """ strings using escaping (\""")
Brian
A: 

Actually I was looking for the index, but that helped me to find the correct solution. Thanks

Bartosz Radaczyński
+4  A: 

Rather than using a regex, or relying on specific formatting you could use python's tokenize module.

import tokenize
f=open(filename)
insert_index = None
for tok, text, (srow, scol), (erow,ecol), l in tokenize.generate_tokens(f.readline):
    if tok == tokenize.COMMENT:
        continue
    elif tok == tokenize.STRING:
        insert_index = erow, ecol
        break
    else:
        break # No docstring found

This way you can even handle pathological cases like:

# Comment
# """Not the real docstring"""
' this is the module\'s \
docstring, containing:\
""" and having code on the same line following it:'; this_is_code=42

excactly as python would handle them.

Brian