views:

323

answers:

4

I'm reading a file in Python that isn't well formatted, values are separated by multiple spaces and some tabs too so the lists returned has a lot of empty items, how do I remove/avoid those?

This is my current code:

import re

f = open('myfile.txt','r') 

for line in f.readlines(): 
    if re.search(r'\bDeposit', line):
        print line.split(' ')

f.close()

Thanks

A: 

Why not do line.strip() before handling it? Also, you could use re.split to use a regex like '\s+' as your delimiter.

profjim
for line.strip() in f.readlines()? It gives an error.
Nimbuz
`for line in f.readlines(): line.strip(); continue_processing`...SO comments aren't friendly to Python code.
profjim
this will only remove whitespace from the head/tail of the string
jcoon
yeah, I wasn't finished. But Max S is right.
profjim
If you do `line.split()` with no arguments, you get `strip` for free. `' a b c d '.split() == ['a', 'b', 'c', 'd']`
jcdyer
+11  A: 

Don't explicitly specify ' ' as the delimiter. line.split() will split on all whitespace. It's equivalent to using re.split:

>>> line = '  a b   c \n\tg  '
>>> line.split()
['a', 'b', 'c', 'g']
>>> import re
>>> re.split('\s+', line)
['', 'a', 'b', 'c', 'g', '']
>>> re.split('\s+', line.strip())
['a', 'b', 'c', 'g']
Max Shawabkeh
forgot that string.split() with no argument split on _runs_ of whitespace. +1
profjim
+1 this is exactly what should be done
jcoon
Great, it did remove almost all whitespace from the lines, except one at the beginning and one at the end, strange.
Nimbuz
re.split('\s+', line.strip()) will fix that
jcoon
Thanks, that worked.
Nimbuz
@jcoon: or `line.strip().split()`
Javier
Nimbuz, `line.split()` will take care of stripping whitespace at the start/end.
Max Shawabkeh
+2  A: 
for line in open("file"):
    if " Deposit" in line:
         line=line.rstrip()
         print line.split()

Update:

for line in open("file"):
    if "Deposit" in line:
         line=line.rstrip()
         print line[line.index("Deposit"):].split()
ghostdog74
if " Deposit" in line << thanks for that! :)
Nimbuz
Note that `" Deposit" in line` is not equivalent to `re.search(r'\bDeposit', line)`. The latter will match `"this,Deposit"`, while the former won't.
Max Shawabkeh
+1  A: 
linesAsLists = [line.split() for line in open('myfile.txt', 'r') if 'Deposit' in line)]
cjrh