tags:

views:

702

answers:

4

I have 2 csv files. I need to replace a column in one file with a column from the other file but they have to stay sorted according to an ID column.

Here's an example:

file1:

ID, transect, 90mdist                                      
1, a, 10,                                                  
2, b, 20,                                                
3, c, 30,

file2:

ID, transect, 90mdist                                
1, a, 50                                                   
2, b, 70                                                     
3, c, 90

basically I created a new file with the correct 90mdist and I need to insert it into the old file but it has to line up with the same ID #.

It's my understanding that Python treats csv files as a string. so I can either use a dictionary or convert the data into a list and then change it? which way is best?

Any help would be greatly appreciated!!

+3  A: 

The CSV Module in the Python Library is what you need here.

It allows you to read and write CSV files, treating lines a tuples or lists of items.

Just read in the file with the corrected values, store the in a dictionary keyed with the line's ID.

Then read in the second file, replacing the relevant column with the data from the dict and write out to a third file.

Done.

Ber
Beat me to the punch.
Eugene M
+1: Write to a third file. Don't try to update a file in place.
S.Lott
A: 

Once you have your csv lists, one easy way to replace a column in one matrix with another would be to transpose the matrices, replace the row, and then transpose back your edited matrix. Here is an example with your data:

csv1 = [['1', 'a', '10'], ['2', 'b', '20'], ['3', 'c', '30']]

def transpose(matrix):
    return [[matrix[x][y] for x in range(len(matrix))] for y in range(len(matrix[0]))]

transposedCSV1, transposedCSV2 = transpose(csv1), transpose(csv2)
print transposedCSV1
>>> [['1', '2', '3'], ['a', 'b', 'c'], ['10', '20', '30']]

csv1 = transposedCSV1[:2] + [transposedCSV2[2]]
print csv1
>>> [['1', '2', '3'], ['a', 'b', 'c'], ['50', '70', '90']]

csv1 = transpose(csv1)
print csv1
>>> [['1', 'a', '50'], ['2', 'b', '70'], ['3', 'c', '90']]
Greg
A: 

If you're only doing this as a one-off, why bother with Python at all? Excel or OpenOffice Calc will open the two CSV files for you, then you can just cut and paste the column from one to the other.

If the two lists of IDs are not exactly the same then a simple VB macro would do it for you.

Vicky
+1  A: 

Try this:

from __future__ import with_statement

import csv

def twiddle_csv(file1, file2):
    def mess_with_record(record):
        record['90mdist'] = 2 * int(record['90mdist']) + 30
    with open(file1, "r") as fin:
        with open(file2, "w") as fout:
            fields = ['ID', 'transect', '90mdist']
            reader = csv.DictReader(fin, fieldnames=fields)
            writer = csv.DictWriter(fout, fieldnames=fields)
            fout.write(",".join(fields) + '\n')
            reader.next()   # Skip the column header
            for record in reader:
                mess_with_record(record)
                writer.writerow(record)

if __name__ == '__main__':
    twiddle_csv('file1', 'file2')

A couple of caveats:

  • DictReader seems to use the first row as data, even if it matches the fields. Call reader.next() to skip.
  • Data rows cannot have trailing commas. They will be interpreted as empty columns.
  • DictWriter does not appear to write out the column headers. DIY.
hughdbrown