views:

52

answers:

3

Hey,

I am trying to parse a list of data out of a file using python - however I don't want to extract any data that is commented out. An example of the way the data is structured is:

#commented out block
uncommented block
#   commented block

I am trying to only retrieve the middle item, so am trying to exclude the items with hashes at the start. The issue is that some hashes are directly next to the commented items, and some arent, and the expression I currently have only works if items have been commented in the first example above -

(?<!#)(commented)

I tried adding \s+ to the negative lookahead but then I get a complaint that the expression does not have an obvious maximum length. Is there any way to do what I'm attempting to do?

Thanks in advance,

Dan

+5  A: 

Why using regex? String methods would do just fine:

>>> s = """#commented out block
uncommented block
#   commented block
""".splitlines()
>>> for line in s:
    not line.lstrip().startswith('#')


False
True
False
SilentGhost
+1 Regexes are great... for certain problems. For others, there are much better (and less cryptic) solutions ;)
delnan
+1: use the right tool for the job. It's not always necessary to bring out the sledgehammer.
JoshD
A: 

As SilentGhost indicated, a regular expression isn't the best solution to this problem, but I thought I'd address the negative look behind.

You thought of doing this:

(?<!#\s+)(commented)

This doesn't work, because the look behind needs a finite length. You could do something like this:

(?<!#)(\s+commented)

This would match the lines you want, but of course, you'd have to strip the whitespace off the comment group. Again, string manipulation is better for what you're doing, but I wanted to show how negative look behind could work since you were asking.

JoshD
A: 
>>> s = """#commented out block
... uncommented block
...    #   commented block
... """
>>> for i in s.splitlines():
...    if not i.lstrip().startswith("#"):
...       print i
...
uncommented block
ghostdog74