views:

146

answers:

3

Hi all,

I have two tab delimited .csv file. From one.csv I have created a dictionary which looks like:

'EB2430': ' "\t"idnD "\t"yjgV "\t"b4267 "\n',
'EB3128': ' "\t"yagE "\t\t"b0268 "\n',
'EB3945': ' "\t"maeB "\t"ypfF "\t"b2463 "\n',
'EB3944': ' "\t"eutS "\t"ypfE "\t"b2462 "\n',

I would like to insert the value of the dictionary into the second.csv file which looks like:

"EB2430"    36.81   364 222 4   72  430 101 461 1.00E-063   237
"EB3128"    26.04   169 108 6   42  206 17  172 6.00E-006   45.8
"EB3945"    20.6    233 162 6   106 333 33  247 6.00E-005   42.4
"EB3944"    19.07   367 284 6   1   355 1   366 2.00E-023   103 

With a resultant output tab delimited:

'EB2430'   idnD   yjgV   b4267   36.81   364 222 4   72  430 101 461 1.00E-063   237
'EB3128'   yagE   b0268   26.04   169 108 6   42  206 17  172 6.00E-006   45.8
'EB3945'   maeB   ypfF   b2463   20.6    233 162 6   106 333 33  247 6.00E-005   42.4
'EB3944'   eutS   ypfE   b2462   19.07   367 284 6   1   355 1   366 2.00E-023   103

Here is my code for creating the dictionary:

f = open ("one.csv", "r")
g = open ("second.csv", "r")
eb = []
desc = []
di = {} 

for line in f:
    for row in f:
        eb.append(row[1:7])
        desc.append(row[7:])

di = dict(zip(eb,desc))

Sorry for it being so long-winded!! I've not been programming for long.

Cheers!

Sat

+2  A: 

It looks like you could more usefully use the Python standard library csv module here. rather than perform the text processing parts youself "manually". E.g.:

import csv
with open("one.csv", "r") as f:
  rows_one = list(csv.reader(f, delimiter='\t'))
with open("second.csv", "r") as g:
  rows_two = list(csv.reader(g, delimiter='\t'))
rows_totl = [r + s[1:] for r, s in zip(rows_one, rows_two)]
with open("total.csv", "w") as h:
  csv.writer(h, delimiter='\t').writerows(rows_totl)

The with statement is one of the jewels of Python 2.6 (it's also usable in 2.5, but only if you from __future__ import with_statement!-) -- as used here, it gives you an open file and ensures it gets closed as soon at the with body's done... plus, it has a zillion more uses, e.g. for locks and all sorts of your own custom-coded "context managers".

Alex Martelli
Nice solution!!
systempuntoout
Alex, In addition to using the csv module, you are using the "with" statement. It might be nice to point out why you made that change as well.
John Mulder
@John, OK, editing to point that out.
Alex Martelli
A: 

May I suggest, instead of hand-parsing CSV file use csv that is built-in. It takes care of delimiters, character escaping etc. It's API is simple, too:

import csv

# Auto-detector of this particular CSV dialect (delimiters and such)
dialect = csv.Sniffer().sniff(open('one.csv').read())

# csv.reader yields every row found in the file using the given dialect
rows = csv.reader(open('one.csv'), dialect = dialect)

# [list comprehension][2]
resulting_dict = dict((row[0], row[1:]) for row in rows)

You can refactor the code into a function and use it for both files (coding from memory, though, beware for errors).

Now you've got two dicts for two files, let's say dict1 and dict2, you can combine them:

combined_dict = dict((key, dict1[key] + dict2[key]) for key in dict2)

Writing it to a .csv file is also straightforward:

writer = csv.writer(open('second.csv', 'w'), delimiter = '\t')
for key, values in combined_dict:
    writer.writerow(key, *values)

Definitely check out the docs for the more detailed reference.

Edit: My solution doesn't take line ordering into account (dict is unordered). There are two solutions:

  • if you're running Python 3 or Python 2.7, use collections.OrderedDict
  • otherwise, you need to store the order of the lines - for example, while reading the second file break the list comprehension into standard for statement and store the headers in a list.
Mike Hordecki
Just wondered what's up with those unicorns and saw my own one..
Mike Hordecki
A: 

Have a look to csv module:

import csv
reader1 = csv.reader(open('input1.csv'), delimiter = '\t')
reader2 = csv.reader(open('input2.csv'), delimiter = '\t')
csvwriter = csv.writer(open('output.csv', 'w'),delimiter = '\t')
while True:
    row1 = reader1.next()
    if row1:
       row2 = reader2.next()
       new_row = row2 + row1[1:]
       csvwriter.writerow(new_row)
    else:
        break
systempuntoout