Guys, I here have 200 separate csv files named from SH (1) to SH (200). I want to merge them into a single csv file. How can I do it?
If the merged CSV is going to be used in Python then just use glob
to get a list of the files to pass to fileinput.input()
via the files
argument, then use the csv
module to read it all in one go.
fout=open("out.csv","a")
for num in range(1,201):
for line in open("sh"+str(num)+".csv"):
fout.write(line)
fout.close()
It depends what you mean by "merging" -- do they have the same columns? Do they have headers? For example, if they all have the same columns, and no headers, simple concatenation is sufficient (open the destination file for writing, loop over the sources opening each for reading, use shutil.copyfileobj from the open-for-reading source into the open-for-writing destination, close the source, keep looping -- use the with
statement to do the closing on your behalf). If they have the same columns, but also headers, you'll need a readline
on each source file except the first, after you open it for reading before you copy it into the destination, to skip the headers line.
If the CSV files don't all have the same columns then you need to define in what sense you're "merging" them (like a SQL JOIN? or "horizontally" if they all have the same number of lines? etc, etc) -- it's hard for us to guess what you mean in that case.
You could import csv then loop through all the CSV files reading them into a list. Then write the list back out to disk.
import csv
rows = []
for f in (file1, file2, ...):
reader = csv.reader(open("f", "rb"))
for row in reader:
rows.append(row)
writer = csv.writer(open("some.csv", "wb"))
writer.writerows("\n".join(rows))
The above is not very robust as it has no error handling nor does it close any open files. This should work whether or not the the individual files have one or more rows of CSV data in them. Also I did not run this code, but it should give you an idea of what to do.
As ghostdog74 said, but this time with headers:
fout=open("out.csv","a")
# first file:
for line in open("sh1.csv"):
fout.write(line)
# now the rest:
for num in range(2,201):
f = open("sh"+str(num)+".csv")
f.next() # skip the header
for line in f:
fout.write(line)
f.close() # not really needed
fout.close()