tags:

views:

62

answers:

6

Hey,

I've several .csv files (~10) and need to merge them together into a single file horizontally. Each file has the same number of rows (~300) and 4 header lines which are not necessarily identical, but should not be merged (only take the header lines from the first .csv file). The tokens in the lines are comma separated with no spaces in between.

As a python noob I've not come up with a solution, though I'm sure there's a simple solution to this problem. Any help is welcome.

A: 

You learn by doing (and trying, even). So, I'll just give you a few hints. Use the following functions:

If you really don't know what to do, I recommend you read the tutorial and Dive Into Python 3. (Depending on how much Python you know, you'll either have to read through the first few chapters or cut straight to the file IO chapters.)

Beau Martínez
A: 

If you don't necessarily have to use Python, you can use shell tools like paste/gawk etc

$ paste file1 file2 file3 file4 .. | awk 'NR>4'

The above will put them horizontally without the headers. If you want the headers, just get them from file1

$  ( head -4 file ; paste file[1-4] | awk 'NR>4' ) > output
ghostdog74
+2  A: 

The csv module is you friend.

Till Backhaus
+5  A: 

You can load the CSV files using the csv module in Python. Please refer to the documentation of this module for the loading code, I cannot remember it but it is really easy. Something like:

import csv
reader = csv.reader(open("some.csv", "rb"))
csvContent = list(reader)

After that, when you have the CSV files loaded in such form (a list of tuples):

[ ("header1", "header2", "header3", "header4"),
  ("value01", "value12", "value13", "value14"),
  ("value11", "value12", "value13", "value14"),
  ... 
]

You can merge two such lists line-by-line:

result = [a+b for (a,b) in zip(csvList1, csvList2)]

To save such a result, you can use:

writer = csv.writer(open("some.csv", "wb"))
writer.writerows(result)
Kos
perhaps you would need to slice before merge and then do something like this instead of the list comprehension.a.extend(b[4:])
anijhaw
A: 

Purely for learning purposes

A simple approach that does not take advantage of csv module:

# open file to write
file_to_write = open(filename, 'w')
# your list of csv files
csv_files = [file1, file2, ...] 

headers = True
# iterate through your list
for filex in csv_files:
    # mark the lines that are header lines
    header_count = 0
    # open the csv file and read line by line
    filex_f = open(filex, 'r')
    for line in filex_f:
        # write header only once
        if headers:
            file_to_write.write(line+"\n")
            if header_count > 3: headers = False
        # Write all other lines to the file
        if header_count > 3:
            file_to_write.write(line+"\n")
        # count lines
        header_count = header_count + 1
    # close file
    filex_f.close()
file_to_write.close()
pyfunc
+1  A: 

You dont need to use csv module for this. You can just use

file1 = open(file1)

After opening all your files you can do this

from itertools import izip_longest

foo=[]
for new_line in izip_longest(file1,fil2,file3....,fillvalue=''):
    foo.append(new_line)

This will give you this structure (which kon has already told you)..It will also work if you have different number of lines in each file

[ ("line10", "line20", "line30", "line40"),
  ("line11", "line21", "line31", "line41"),
  ... 
]

After this you can just write it to a new file taking 1 list at a time

for listx in foo:
    new_file.write(','.join(j for j in listx))

PS: more about izip_longest here

Rafi