views:

639

answers:

4

I am trying to merge three fields in each line of a CSV file using Python. This would be simple, except some of the fields are surrounded by double quotes and include commas. Here is an example:

,,Joe,Smith,New Haven,CT,"Moved from Portland, CT",,goo,

Is there a simple algorithm that could merge fields 7-9 for each line in this format? Not all lines include commas in double quotes.

Thanks.

+9  A: 

Something like this?

import csv
source= csv.reader( open("some file","rb") )
dest= csv.writer( open("another file","wb") )
for row in source:
    result= row[:6] + [ row[6]+row[7]+row[8] ] + row[9:]
    dest.writerow( result )


Example

>>> data=''',,Joe,Smith,New Haven,CT,"Moved from Portland, CT",,goo,
... '''.splitlines()
>>> rdr= csv.reader( data )
>>> row= rdr.next()
>>> row
['', '', 'Joe', 'Smith', 'New Haven', 'CT', 'Moved from Portland, CT', '', 'goo', '' ]
>>> row[:6] + [ row[6]+row[7]+row[8] ] +  row[9:]
['', '', 'Joe', 'Smith', 'New Haven', 'CT', 'Moved from Portland, CTgoo', '']
S.Lott
@S.Lott: Sorry to upset the apple-cart, but actually reading the code reveals that result will be a tuple of 3 elements of which the first and third will be lists ... consequently the output will be a mishmash.
John Machin
@9upvoters: ????
John Machin
+1  A: 

There's a builtin module in Python for parsing CSV files:

http://docs.python.org/library/csv.html

sharjeel
+1  A: 

You have tagged this question as 'database'. In fact, maybe it would be easier to upload the two files to separate tables of the db (you can use sqllite or any python sql library, like sqlalchemy) and then join them.

That would give you some advantage after, you would be able to use a sql syntax to query the tables and you can store it on the disk instead of keeping it on memory, so think about it.. :)

dalloliogm
+2  A: 

You can use the csv module to do the heavy lifting: http://docs.python.org/library/csv.html

You didn't say exactly how you wanted to merge the columns; presumably you don't want your merged field to be "Moved from Portland, CTgoo". The code below allows you to specify a separator string (maybe ", ") and handles empty/blank fields.

[transcript of session]
prompt>type merge.py
import csv

def merge_csv_cols(infile, outfile, startcol, numcols, sep=", "):
    reader = csv.reader(open(infile, "rb"))
    writer = csv.writer(open(outfile, "wb"))
    endcol = startcol + numcols
    for row in reader:
        merged = sep.join(x for x in row[startcol:endcol] if x.strip())
        row[startcol:endcol] = [merged]
        writer.writerow(row)

if __name__ == "__main__":
    import sys
    args = sys.argv[1:6]
    args[2:4] = map(int, args[2:4])
    merge_csv_cols(*args)

prompt>type input.csv
1,2,3,4,5,6,7,8,9,a,b,c
1,2,3,4,5,6,,,,a,b,c
1,2,3,4,5,6,7,8,,a,b,c
1,2,3,4,5,6,7,,9,a,b,c

prompt>\python26\python merge.py input.csv output.csv 6 3 ", "

prompt>type output.csv
1,2,3,4,5,6,"7, 8, 9",a,b,c
1,2,3,4,5,6,,a,b,c
1,2,3,4,5,6,"7, 8",a,b,c
1,2,3,4,5,6,"7, 9",a,b,c
John Machin