ansaurus

Question

python style question around reading small files

Answer 1

+1 A:

I would use this:

processed = [process(line.strip())
             for line in open(config_file, 'r')
             if line.strip() and not line.strip().startswith('#')]

The only ugliness I see here is all the repeated stripping. Getting rid of it complicates the function a bit:

processed = [process(line)
             for line in (line.strip() for line in open(config_file, 'r'))
             if line and not line.startswith('#')]

Max Shawabkeh 2010-02-02 05:50:09

Answer 2

+2 A:

for line in open("file"):
    sline=line.strip()
    if sline and not sline[0]=="#" :
       print line.strip()

output

$ cat file
one
#
  #

two

three
$ ./python.py
one
two
three

ghostdog74 2010-02-02 05:50:55

Any reason you use `not foo == bar` instead of `foo != bar` ?

Peter Hansen 2010-02-02 05:59:02

Answer 3

A:

The file is small, so performance is not really an issue. I will go for clarity than conciseness:

fp = open('file.txt')
for line in fp:
    line = line.strip()
    if line and not line.startswith('#'):
        # process
fp.close()

If you want, you can wrap this in a function.

Alok 2010-02-02 05:54:15

Answer 4

+1 A:

This matches the description, ie

strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines

So lines that start or end in spaces are passed through unfettered.

with open("config_file","r") as fp:
    data = (line for line in fp if line.strip() and not line.startswith("#"))
    for item in data:
        print repr(item)

gnibbler 2010-02-02 06:02:20

You need to strip `line` twice in the generator - the return and the `startswith` check. Props for using `with` though.

Max Shawabkeh 2010-02-02 06:04:39

@Max, that's not actually the case, if you read the requirements strictly. He said "or have # as first character" (i.e. not first non-blank). He might be happier with the latter interpretation, but gnibbler's answer is correct.

Peter Hansen 2010-02-02 06:07:27

Right. I went off the code in the question, which does strip in both cases.

Max Shawabkeh 2010-02-02 06:11:12

Answer 5

+5 A:

Generators are perfect for tasks like this. They are readable, maintain perfect separation of concerns, and efficient in memory-use and time.

def RemoveComments(lines):
    for line in lines:
        if not line.strip().startswith('#'):
            yield line

def RemoveBlankLines(lines):
    for line in lines:
        if line.strip():
            yield line

Now applying these to your file:

filehandle = open('myfile', 'r')
for line in RemoveComments(RemoveBlankLines(filehandle)):
    Process(line)

In this case, it's pretty clear that the two generators can be merged into a single one, but I left them separate to demonstrate their composability.

Paul Hankin 2010-02-02 06:03:18

Generator expressions would be much more Pythonic, especially since the question specifically asks for concision.

Max Shawabkeh 2010-02-02 06:52:46

It's clear, simple, easy to tell what it does and easy to test.

James Brooks 2010-02-02 10:47:33

Answer 6

+3 A:

lines = [r for r in open(thefile) if not r.isspace() and r[0] != '#']

The .isspace() method of strings is by far the best way to test if a string is entirely whitespace -- no need for contortions such as len(r.strip()) == 0 (ech;-).

Alex Martelli 2010-02-02 06:03:32

Nice, I didn't know about isspace.

gnibbler 2010-02-02 06:08:07

`"".isspace()` returns False, but `r` will always have at least a newline in it

gnibbler 2010-02-02 06:12:45

@gnibbler, be careful with it though... works fine here because the lines are all newline-terminated (or non-empty, which the last line will be even if it's not properly terminated). If they're not, this won't work as `isspace()` returns `False` for empty strings.

Peter Hansen 2010-02-02 06:14:59

Yep, in a different situation (where empty strings might be a possibility -- not applicable to the actual problem) you might have to use `r and not r.isspace()` to exclude both empty and all-whitespace strings. Not applicable to _this_ question though;-).

Alex Martelli 2010-02-02 06:18:54

Answer 7

A:

Using slightly newer idioms (or with Python 2.5 from __future__ import with) you could do this, which has the advantage of cleaning up safely yet is quite concise.

with file('file.txt') as fp:
    for line in fp:
        line = line.strip()
        if not line or line[0] == '#':
            continue

        # rest of processing here

Note that stripping the line first means the check for "#" will actually reject lines with that as the first non-blank, not merely "as first character". Easy enough to modify if you're strict about that.

Peter Hansen 2010-02-02 06:05:25

Answer 8

+1 A:

I like Paul Hankin's thinking, but I'd do it differently:

from itertools import ifilter, ifilterfalse, imap

with open(r'c:\temp\testfile.txt', 'rb') as f:
    s1 = ifilterfalse(str.isspace, f)
    s2 = ifilter(lambda x: not x.startswith('#'), s1)
    s3 = imap(str.rstrip, s2)
    print "\n".join(s3)

I'd probably only do it this way instead of using some of the more obvious approaches suggested here if I were concerned about memory usage. And I might define an iscomment function to eliminate the lambda.

Robert Rossney 2010-02-02 08:19:50

ansaurus

tags:

views:

answers:

python style question around reading small files

related questions