



I have a large file of names and values on a single line separated by a space:

name1 name2 name3....

Following the long list of names is a list of values corresponding to the names. The values can be 0-4 or na. What I want to do is consolidate the data file and remove all the names and and values when the value is na.

For instance, the final line of name in this file is like so:

namenexttolast nameonemore namethelast 0 na 2

I would like the following output:

namenexttolast namethelast 0 2

How would I do this using Python?

+5  A: 

Let's say you read the names into one list, then the values into another. Once you have a names and values list, you can do something like:

result = [n for n, v in zip(names, values) if v != 'na']

result is now a list of all names whose value is not "na".

Justin Ardini
The OP asked for a string in the same format as the input, e.g. `n1 n2 n3 v1 v2 v3` where no `na` values occur. You only give the names of those users and are throwing away the values.
Jesse Dhillon
I believe this edit occurred after my answer. By now, other answers have covered how to do this without discarding the values.
Justin Ardini
+1  A: 

or say you have a string which you have read from a file. Let's call this string as "s"

words = filter(lambda x: x!="na", s.split())

should give you all the strings except for "na"

edit: the code above obviously doesn't do what you want it to do.

the one below should work though

d = s.split()
keys = d[:len(d)/2]
vals = d[len(d)/2:]
w = " ".join(map(lambda (k,v): (k + " " + v) if v!="na" else "", zip(keys, vals)))
print " ".join([" ".join(w.split()[::2]), " ".join(w.split()[1::2])])
Although quite difficult to read, I like your list iteration semantics. +1
Jesse Dhillon
+4  A: 
s = "name1 name2 name3 v1 na v2"
s = s.split(' ')
names = s[:len(s)/2]
values = s[len(s)/2:]

names_and_values = zip(names, values)
names, values = [], []
[(names.append(n) or values.append(v)) for n, v in names_and_values if v != "na"]

print ' '.join(names)


Minor improvement after suggestion from Paul. I'm sure the list comprehension is fairly unpythonic, as it leverages the fact that list.append returns None, so both append expressions will be evaluated and a list of None values will be constructed and immediately thrown away.

Jesse Dhillon
Storing names and values into a dict and then getting them back out using iteritems will not preserve order of the names. (Not clear whether the OP cares about order or not, though.) For that matter, `names_and_values` is already a list of name-value pairs, why create a dict just to get iteritems out of it? Just iterate over `names_and_values`.
Paul McGuire
@Paul In response to your feedback, I've made a couple changes that you may or may not appreciate.
Jesse Dhillon
Yeeps! Did I inspire that? Oh, please don't get in the habit of using list comps as for-loop-one-liners. Better to learn `zip(*seq_of_seqs)` to perform a transpose on a sequence of sequences. But yes, I must admit this was clever. Just DON'T EVER DO IT AGAIN! :)
Paul McGuire
Just to clarify: if I have `seq_of_seqs=[(n1,v1),(n2,v2),(n3,v3)]` then `zip(*seq_of_seqs)` will give `[(n1,n2,n3),(v1,v2,v3)]`. I use this in my own submitted spaghetti code answer (or maybe it is more like tortellini code).
Paul McGuire
Yes, it's as if you called `zip((n1,v1), (n2,v2), (n3,v3))` etc.
Jesse Dhillon
+1  A: 

I agree with Justin than using zip is a good idea. The problems is how to put the data into two different lists. Here is a proposal that should work ok.

reader = open('input.txt')
writer = open('output.txt', 'w')
names, nums = [], []
row =' ')
x = len(row)/2
for (a, b) in [(n, v) for n, v in zip(row[:x], row[x:]) if v!='na']:
writer.write(' '.join(names))
writer.write(' ')
writer.write(' '.join(nums))
#writer.write(' '.join(names+nums)) is nicer but cause list to be concat
I believe you will have to write a space between writing out your two lists, or your last name and your first value will run together.
Jesse Dhillon
@Jesse : Correct, Thanks. I saw your answer just before posting mine. It is very similar but I decided to post it. I didn't want to waste the few minutes I spent on it. :)
strlist = 'namenexttolast nameonemore namethelast 0 na 2'.split()
vals = ('0', '1', '2', '3', '4', 'na')
key_list = [s for s in strlist if s not in vals]
val_list = [s for s in strlist if s in vals]

#print [(key_list[i],v) for i, v in enumerate(val_list) if v != 'na']
filtered_keys = [key_list[i] for i, v in enumerate(val_list) if v != 'na']
filtered_vals = [v for v in val_list if v != 'na']

print filtered_keys + filtered_vals

If you'd rather group the vals, you could create a list of tuples instead (commented out line)


Here is a solution that uses just iterators plus a single buffer element, with no calls to len and no other intermediate lists created. (In Python 3, just use map and zip, no need to import imap and izip from itertools.)

from itertools import izip, imap, ifilter

def iterStartingAt(cond, seq):
    it1,it2 = iter(seq),iter(seq)
    while not cond(
    for item in it2:
        yield item

dataline = "namenexttolast nameonemore namethelast 0 na 2"
datalinelist = dataline.split()

valueset = set("0 1 2 3 4 na".split())

print " ".join(imap(" ".join, 
                    izip(*ifilter(lambda (n,v): v != 'na', 
                                       iterStartingAt(lambda s: s in valueset, 


namenexttolast namethelast 0 2
Paul McGuire