ansaurus

Question

csv file column reading and extracting using python

Answer 1

+3 A:

Edit: If the input file is a comma-separated values file, then to maintain the order of the keys, use reader.fieldnames instead of the keys in allrows[0].

So the solution would be:

keepcols = [c for c in reader.fieldnames if any(r[c] != '0' for r in allrows)]

The input file posted above looks like it has space-separated columns. In this case, I don't think csv is the right tool for parsing it. Instead, you can use split:

import csv
with open("test1.csv","r") as f:
    fields=next(f).split()
    # print(fields)
    allrows=[]
    for line in f:
        line=line.split()
        row=dict(zip(fields,line))
        allrows.append(row)
        # print(row)
    keepcols = [c for c in fields if any(row[c] != '0' for row in allrows)]
    print keepcols
    writer=csv.DictWriter(open("output1.csv","w"),fieldnames=keepcols,extrasaction='ignore')
    writer.writerows(allrows)

Edit2: The reason why the column order was changing is because for c in allrows[0] returns the keys of allrows[0] in an unspecified order. dict keys are not ordered by default. The above code works around this by defining fields to be a list, not a dict.

Original answer: Change fieldnames='keepcols' to fieldnames=keepcols.

fieldnames needs to be a sequence of keys, such as ['fieldA','fieldB',...].

A potential pitfall to be aware of in Python is that strings are sequences. When you iterate over a string, you get the characters of the string. So when you say fieldnames='keepcols', you are setting fieldnames to be the sequence of characters ['k','e','e','p','c','o','l','s']. You don't get an error because this is a valid sequence of keys. But your list of dicts, allrows doesn't happen to have these keys. writer.writerows blithely ignores this since extrasaction='ignore'.

unutbu 2010-07-11 11:39:17

tried changing... i get column2 first and then column1 only as ouput... the remaning cols do not appear... but i need to extract a column even if it has a single 1... please help....

beginner 2010-07-11 11:48:11

so wat should i do about it..??i m really lost... :(

beginner 2010-07-11 11:51:45

Change the `all` to `any`. By the way, that you wanted this wasn't very clear from the original question.

Adam Bernier 2010-07-11 11:54:04

ok i m so sorry... i ll ask my questions more clearly hereafter... i changed 'all' to 'any' and it worked... but in my output... the order of the csv table seems to change.. the title column is not in its place... it appears as the 4th column... pl help...

beginner 2010-07-11 11:54:58

i want only those cols which have atleast one non zero element... and this is definitely a csv file whose separator is comma.... but your code is also helpful for another problem that i currently have.. thank you so much :)

beginner 2010-07-11 12:03:41

thanks ubuntu.... u rock.... it still worked... i just changed the split function to split (',')

beginner 2010-07-11 12:09:54

Great. Glad I could help.

unutbu 2010-07-11 12:13:39

just a small hitch... wat if i want to get the title row as well??? as in the name of the particular column with the header as wel... as in the column names.... ??

beginner 2010-07-11 12:30:29

Use `writer.writeheader()` to write the header. See http://docs.python.org/library/csv.html#csv.DictWriter.writeheader

unutbu 2010-07-11 12:33:13

um, wat if i m using python 2.6? it seems this attribute does not exist in python 2.6... should i upgrade right away or is there a better way to do it ??

beginner 2010-07-11 12:44:24

Well, I wouldn't upgrade just for this. This should work: `writer.writerow(dict(zip(keepcols,keepcols)))`

unutbu 2010-07-11 12:54:12

ansaurus

tags:

views:

answers:

csv file column reading and extracting using python

related questions