views:

121

answers:

8

What is the most pythonic way to read in a named file, strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines? Assume it all fits easily in memory.

Note: it's not tough to do this -- what I'm asking is for the most pythonic way. I've been writing a lot of Ruby and Java and have lost my feel.

Here's a strawman:

file_lines = [line.strip() for line in open(config_file, 'r').readlines() if len(line.strip()) > 0]
for line in file_lines:
  if line[0] == '#':
    continue
  # Do whatever with line here.

I'm interested in concision, but not at the cost of becoming hard to read.

+1  A: 

I would use this:

processed = [process(line.strip())
             for line in open(config_file, 'r')
             if line.strip() and not line.strip().startswith('#')]

The only ugliness I see here is all the repeated stripping. Getting rid of it complicates the function a bit:

processed = [process(line)
             for line in (line.strip() for line in open(config_file, 'r'))
             if line and not line.startswith('#')]
Max Shawabkeh
+2  A: 
for line in open("file"):
    sline=line.strip()
    if sline and not sline[0]=="#" :
       print line.strip()

output

$ cat file
one
#
  #

two

three
$ ./python.py
one
two
three
ghostdog74
Any reason you use `not foo == bar` instead of `foo != bar` ?
Peter Hansen
A: 

The file is small, so performance is not really an issue. I will go for clarity than conciseness:

fp = open('file.txt')
for line in fp:
    line = line.strip()
    if line and not line.startswith('#'):
        # process
fp.close()

If you want, you can wrap this in a function.

Alok
+1  A: 

This matches the description, ie

strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines

So lines that start or end in spaces are passed through unfettered.

with open("config_file","r") as fp:
    data = (line for line in fp if line.strip() and not line.startswith("#"))
    for item in data:
        print repr(item)
gnibbler
You need to strip `line` twice in the generator - the return and the `startswith` check. Props for using `with` though.
Max Shawabkeh
@Max, that's not actually the case, if you read the requirements strictly. He said "or have # as first character" (i.e. not first non-blank). He might be happier with the latter interpretation, but gnibbler's answer is correct.
Peter Hansen
Right. I went off the code in the question, which does strip in both cases.
Max Shawabkeh
+5  A: 

Generators are perfect for tasks like this. They are readable, maintain perfect separation of concerns, and efficient in memory-use and time.

def RemoveComments(lines):
    for line in lines:
        if not line.strip().startswith('#'):
            yield line

def RemoveBlankLines(lines):
    for line in lines:
        if line.strip():
            yield line

Now applying these to your file:

filehandle = open('myfile', 'r')
for line in RemoveComments(RemoveBlankLines(filehandle)):
    Process(line)

In this case, it's pretty clear that the two generators can be merged into a single one, but I left them separate to demonstrate their composability.

Paul Hankin
Generator expressions would be much more Pythonic, especially since the question specifically asks for concision.
Max Shawabkeh
It's clear, simple, easy to tell what it does and easy to test.
James Brooks
+3  A: 
lines = [r for r in open(thefile) if not r.isspace() and r[0] != '#']

The .isspace() method of strings is by far the best way to test if a string is entirely whitespace -- no need for contortions such as len(r.strip()) == 0 (ech;-).

Alex Martelli
Nice, I didn't know about isspace.
gnibbler
`"".isspace()` returns False, but `r` will always have at least a newline in it
gnibbler
@gnibbler, be careful with it though... works fine here because the lines are all newline-terminated (or non-empty, which the last line will be even if it's not properly terminated). If they're not, this won't work as `isspace()` returns `False` for empty strings.
Peter Hansen
Yep, in a different situation (where empty strings might be a possibility -- not applicable to the actual problem) you might have to use `r and not r.isspace()` to exclude both empty and all-whitespace strings. Not applicable to _this_ question though;-).
Alex Martelli
A: 

Using slightly newer idioms (or with Python 2.5 from __future__ import with) you could do this, which has the advantage of cleaning up safely yet is quite concise.

with file('file.txt') as fp:
    for line in fp:
        line = line.strip()
        if not line or line[0] == '#':
            continue

        # rest of processing here

Note that stripping the line first means the check for "#" will actually reject lines with that as the first non-blank, not merely "as first character". Easy enough to modify if you're strict about that.

Peter Hansen
+1  A: 

I like Paul Hankin's thinking, but I'd do it differently:

from itertools import ifilter, ifilterfalse, imap

with open(r'c:\temp\testfile.txt', 'rb') as f:
    s1 = ifilterfalse(str.isspace, f)
    s2 = ifilter(lambda x: not x.startswith('#'), s1)
    s3 = imap(str.rstrip, s2)
    print "\n".join(s3)

I'd probably only do it this way instead of using some of the more obvious approaches suggested here if I were concerned about memory usage. And I might define an iscomment function to eliminate the lambda.

Robert Rossney