views:

1007

answers:

4

I'm trying to split a file with a list comprehension using code similar to:

lines = [x for x in re.split(r"\n+", file.read()) if not re.match(r"com", x)]

However, the lines list always has an empty string as the last element. Does anyone know a way to avoid this (excluding the cludge of putting a pop() afterwards)?

+2  A: 

lines = file.readlines()

edit: or if you didnt want blank lines in there, you can do

lines = filter(lambda a:(a!='\n'), file.readlines())

edit^2: to remove trailing newines, you can do

lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]

Alex
Leaves the trailing newlines on. I'm not sure if that's an issue for OP.
Dave
file.readlines() is not quite the same...It includes the newline at the end of each line, and includes empty lines.
Ashton
That is enough code to work with. Thanks a lot :D
Ashton
+5  A: 

Put the regular expression hammer away :-)

  1. You can iterate over a file directly; readlines() is almost obsolete these days.
  2. Read about str.strip() (and its friends, lstrip() and rstrip()).
  3. Don't use file as a variable name. It's bad form, because file is a built-in function.

You can write your code as:

lines = []
f = open(filename)
for line in f:
    if not line.startswith('com'):
        lines.append(line.strip())

If you are still getting blank lines in there, you can add in a test:

lines = []
f = open(filename)
for line in f:
    if line.strip() and not line.startswith('com'):
        lines.append(line.strip())

If you really want it in one line:

lines = [line.strip() for line in open(filename) if line.strip() and not line.startswith('com')]

Finally, if you're on python 2.6, look at the with statement to improve things a little more.

John Fouhy
I haven't written any Python since last year, and I'm recovering from a short but nasty bout of PERL.Thanks to your answer I'm getting back in the mindset :)
Ashton
A: 

This should work, and eliminate the regular expressions as well:

all_lines = (line.rstrip()
             for line in open(filename)
             if "com" not in line)
# filter out the empty lines
lines = filter(lambda x : x, all_lines)

Since you're using a list comprehension and not a generator expression (so the whole file gets loaded into memory anyway), here's a shortcut that avoids code to filter out empty lines:

lines = [line
     for line in open(filename).read().splitlines()
     if "com" not in line]
Ryan Ginstrom
Instead of 'filter(lambda x: x, all_lines)', you can just write 'filter(None, all_lines)'. Although I've never been totally happy with that short-cut :-)
John Fouhy
+1  A: 
blackkettle