tags:

views:

235

answers:

7

Context
I'm parsing some code and want to match the doxygen comments before a function. However, because I want to match for a specific function name, getting only the immediately previous comment is giving me problems.

Current Approach

import re  
function_re = re.compile(
    r"\/\*\*(.+)\*\/\s*void\s+(\w+)\s*::\s*function_name\s*\(\s*\)\s*")  
function_match = function_re.search(file_string)
if function_match:  
    function_doc_str = update_match.group(2)

Problem with Current Approach
The current approach matches doxygen from earlier functions, giving me a result that is the wrong doxygen comment.

Question
Is there a way to search backward through a string using the Python Regex library?
It seems like my problem is that the more restrictive (less frequently occurring part) is the function signature, "void function()"

Possible better question
Is there a better (easier) approach that I'm missing?

A: 

You can do look-behind assertions with (?<=...) or (?<!...), but in general you can only match forwards.

Ignacio Vazquez-Abrams
In .NET you could do a lookahead for the function followed by a lookbehind for the comment. Unfortunately, in Python lookbehinds can only match fixed-length strings.
Alan Moore
A: 

The question is why are these comments not inside the function, so you can use doc.

But there is no easy way with regex.

evilpie
he might be creating a python app to read doxygen comments in C or something
Carson Myers
+2  A: 

simplest way is to just use a group, you don't need to go backwards...

 (commentRegex)functionRegex

Then just extract group 1. You will need to run in multi-line mode to get it working, i don't know python so i can't be more helpful.

It's also possible with lookahead assertions, but this way is simpler.

Paul Creasey
+2  A: 

I think you should use a regex that only matches doxymentation that's immediately before the function. Maybe something like this (simplified example):

import re

test = """

/**
    @doxygen comment
*/
void function()
{
}

"""

doxygenRegex = r"(?P<comment>/\*\*(?:[^/]|/(?!\*\*))*\*/)"
functionRegex = r"(?P<function>\s\w+\s+(?P<functionName>\w+)\s*\()"

match = re.search(doxygenRegex + functionRegex, test)
print match.groupdict()

As long as this matches something, you can loop the regex matching - but starting the search at test[match.end():] next time. Hope that makes sense to you...

BTW if you only want to extract the comment and nothing about the function, you can use lookahead - just replace functionRegex with r"(?=\s\w+\s+\w+\s*\()".

AndiDog
...the trick being to make sure the "comment" regex can't match more than one comment at a time. (You forgot to mention that, 'Dog.) BTW, shouldn't the "function" regex start with `\s+` or `\s*`?
Alan Moore
Yes, it will only match the very last comment before a function. And it could be `\s+`, right. As said, it's a simplified example.
AndiDog
+1  A: 

Note that C isn't a regular language, so it cannot be parsed by regular expressions. Have you considered leveraging doxygen itself to parse this file?

Mike Graham
A: 

here's a non regex approach, split on */ and find if the function you are looking for is at the next item. eg

test = """

/**
    @doxygen comment
*/
void function()
{
}

"""

t=test.split("*/")
for n,comm in enumerate(t):
    try:
        if "void" in t[n+1]:
             print t[n]
    except IndexError: pass
ghostdog74
+1  A: 

This can be achived using a single reg-ex.

The key is to capture the comment just before the desired function. The easy way to do this is to use non-greedy qualifier. For example: /\*\*(.*?)\*/ with MULTILINE flag; however, in Python, non-greedy and MULTILINE do not work together (at least on my environment). So, you need a little trick like this:

/\*\*((?:[^\*]|\*(?!/))*)\*/.

This is to match:

1: the comment begin /**.

2: everything that is not * OR * that does not follows by /

3: the comment end */.

From this idea the code you want is:

function_name  = "function2"
regex_comment  = "/\*\*((?:[^\*]|\*(?!/))*)\*/"
regex_static   = "(?:(\w+)\s*::\s*)?"
regex_function = "(\w+)\s+"+regex_static+"(?:"+function_name+")\s*\([^\)]*\)"
regex = re.compile(regex_comment+"\s*"+regex_function, re.MULTILINE)
text  = """
/**
    @doxygen comment1
*/
void test::function1()
{
}

/**
    @doxygen comment2
*/
void test::function2()
{
}
"""
match = regex.search(text)
if (match == None): print "None"
else:               print match.group(1)

When run, you got:


    @doxygen comment2

Variation: If you want to capture /** and */ too, use regex_comment = "(/\*\*(?:[^\*]|\*(?!/))*\*/)".

Hope this helps.

NawaMan