views:

78

answers:

2

I just found out that I can save space\ speed up reads of CSV files.

Using the answer of my previous question http://stackoverflow.com/questions/3710263/how-do-i-create-a-csv-file-from-database-in-python

And 'wb' for opens

  w = csv.writer(open(Fn,'wb'),dialect='excel')

How can I open all files in a directory and saves all files with the same name as starting name and use 'wb' to reformat all files. I guess convert all CSV's to binary CSV's.

+3  A: 

You can't "overwrite a file on the fly". You have two options:

  1. if the files are small enough (smaller than the amount of available RAM by a comfortable margin), just loop over them (os.listdir makes that loop easy, or os.walk if you want to catch the whole tree of subdirectories, not just one directory), and for each, read it in memory first, then overwrite the on-disk copy.

  2. otherwise, loop over them, and each time write to a new file (e.g. by appending .new to the name), then move the new file over the old. This is safer (no risk of running out of memory, no risk of damaging a file if the computer crashes) but more complicated.

So, what is your situation: small-enough files (and backups for safeguard against computer and disk crashes), in which case I can if you wish show the simple code; or huge multi-GB files -- in which case it will have to be the complex code? Let us know!

Alex Martelli
My files fall into two groups 15GB files, which may get performance boost with this process. The second group are sub 500KB, which is where I saw improvements. Reading ur answer, maybe use C:\current and c:\newfiles is better then appending names. There are 200 files in each group. 15Gb files and sub 500kb files are in two separate dirs on local machinne
@user, sure, changing the directory path is even better than any attempt to "rewrite in place" -- this way you'll most naturally keep the old file versions in the same place and get the new ones in a new place, and can toss the latter and try again if any bug caused issues with them!-). I would use the "read from X, write to Y" approach for both small and big files since this way you only have to write it once, and without the automatic renaming part it's not really any harder!
Alex Martelli
@alex program logic seem simple enough DirOld DirNew Filename with csv.writer. How could I get dir listing of file so I can loop it.
@user, `os.listdir` lists all files in a directory, or you could use the `glob` module if you need to be a little more selective in your listing.
Alex Martelli
A: 

=== WARNING ===

Please explain your fundamental premise "I just found out that I can save space\ speed up reads of CVS [sic -- you mean CSV] files" evidenced by "the" answer (which answer??) to your previous question plus 'wb' on writes -- it is NOT immediately obvious (a) that there is a grain of truth in that at all (b) that a perilous venture like re-writing all your files should be contemplated.

John Machin