views:

106

answers:

4

I'm trying to use regular expression to extract the comments in the heading of a file.

For example, the source code may look like:

//This is an example file.
//Please help me.

#include "test.h"
int main() //main function
{
  ...
}

What I want to extract from the code are the first two lines, i.e.

//This is an example file.
//Please help me.

Any idea?

+2  A: 
>>> code="""//This is an example file.
... //Please help me.
...
... #include "test.h"
... int main() //main function
... {
...   ...
... }
... """
>>>
>>> import re
>>> re.findall("^\s*//.*",code,re.MULTILINE)
['//This is an example file.', '//Please help me.']
>>>

If you only need to match continuous comment lines at the top, you could use following.

>>> re.search("^((?:\s*//.*\n)+)",code).group().strip().split("\n")
['//This is an example file.', '//Please help me.']
>>>
S.Mark
This will give all comment lines in the file. It won't extract just the heading.
Stephen
@Stephen, I've added another regex for that.
S.Mark
+5  A: 

Why use regex?

>>> f = file('/tmp/source')
>>> for line in f.readlines():
...    if not line.startswith('//'):
...       break
...    print line
... 
Stephen
Regexp should be a *last* resort. In my experience, 95% of the regexp use I've seen can be simplified in a way similar to the one Stephen presents here.
Arrieta
The code needs to be slightly modified so that it doesn't exit if the first line was uncommented, or if comment lines have more than one line in between them.
Arrieta
Line starts with " //" ... egad.
pst
+1  A: 

this doesn't just get the first 2 comment lines, but mulitline and // comments at the back as well. Its not what you required though.

data=open("file").read()
for c in data.split("*/"):
    # multiline
    if "/*" in c:
       print ''.join(c.split("/*")[1:])
    if "//" in c:
       for item in c.split("\n"):
          if "//" in c:
             print ''.join(item.split("//")[1:])
ghostdog74
It does not make sense for me, why only first 2 lines, anyway +1
S.Mark
A: 

to extend the context into below considerations

  1. spaces in front of //...
  2. empty lines between each //... line

import re

code = """//This is an example file.    
 a
   //  Please help me.

//  ha

#include "test.h"
int main() //main function
{
  ...
}"""

for s in re.finditer(r"^(\s*)(//.*)",code,re.MULTILINE):
    print(s.group(2))

>>>
//This is an example file.    
//  Please help me.
//  ha
Jim Horng