ansaurus

Question

Answer 1

+4 A:

re.findall('(?:\n[\t ]*)\"{3}(.*?)\"{3}', s, re.M | re.S)

captures only text within triple quotes that are at the begging of a line and could be preceded by spaces, tabs or nothing, as python docstrings should be.

SilentGhost 2009-09-24 14:29:57

what about single quotes?

Triptych 2009-09-24 14:42:13

and what about this: `a = '""" not a real triple quote """'`

Triptych 2009-09-24 14:45:04

why is it not a real triple quote? is there something lost in formatting?

SilentGhost 2009-09-24 14:46:14

I suppose quite similar regex could be used to get single quotes as well (it's quite easy to extend given example), I just see no point in stuffing single regex to the point of unintelligibility.

SilentGhost 2009-09-24 14:48:49

because it's inside a simple quote... so it's part of a string literal.

fortran 2009-09-24 14:53:09

Also: `"""foo\"""bar"""`.

bobince 2009-09-24 15:19:29

is that a raw string, bobince?

SilentGhost 2009-09-24 15:21:55

of course it is. I just typed it into python prompt

nosklo 2009-09-24 16:29:27

He didn't say docstrings, either.

Glenn Maynard 2009-09-24 20:25:13

@Glenn: he didn't. did you downvote bobince's answer too?

SilentGhost 2009-09-24 20:34:07

Answer 2

+9 A:

Python is not a regular language and cannot reliably be parsed using regex.

If you want a proper Python parser, look at the ast module. You may be looking for get_docstring.

bobince 2009-09-24 15:20:41

+1: Question has no valid solution using regexes, only half-working hacks.

nosklo 2009-09-24 16:30:52

I believe regular expressions are powerful enough to do this right. But constructing proper regexp for such task is hard, so using built-in python parser is more reliable solution.

Denis Otkidach 2009-09-25 08:16:06

Do you have a link for that? 'Cannot be reliably parsed using regex'. Which languages can?

kaizer.se 2009-09-25 09:15:37

Barely-readable summary of theory: http://en.wikipedia.org/wiki/Regular_language. Most programming languages aren't, but then modern regex has extensions that take it well beyond traditional regular language matching. Python's syntax, however, is still too complex to be amenable to regex.

bobince 2009-09-25 11:15:44

Also see http://stackoverflow.com/questions/612654/is-regex-in-modern-programming-languages-really-context-sensitive-grammar

bobince 2009-09-25 11:18:09

Answer 3

A:

I've found this one from Tim Peters (I think) :

pat = """
    qqq
    [^\\q]*
    (
    (   \\\\[\000-\377]
        |   q
        (   \\\\[\000-\377]
        |   [^\\q]
        |   q
        (   \\\\[\000-\377]
            |   [^\\q]
        )
        )
    )
    [^\\q]*
    )*
    qqq
"""  
pat = ''.join(pat.split(), '')  
tripleQuotePat = pat.replace("q", "'") + "|" + pat.replace('q', '"')

But, as stated by bobince, regex alone doesn't seem to be the right tool for parsing Python code.
So I went with tokenize from the standard library.

dugres 2009-09-29 09:36:13

And finally, I use the lexer from **pygments** ( http://pygments.org/ )

dugres 2010-01-19 17:33:07

ansaurus

tags:

views:

answers:

regex for triple quote

related questions