tags:

views:

129

answers:

3
import csv

with open('thefile.csv', 'rb') as f:
  data = list(csv.reader(f))
  import collections
  counter = collections.defaultdict(int)

  for row in data:
        counter[row[10]] += 1


with open('/pythonwork/thefile_subset11.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    for row in data:
        if counter[row[10]] >= 504:
           writer.writerow(row)

this code reads thefile.csv, makes changes, and writes results to thefile_subset1

when i open the resulting csv in excel, it should be an extra line after each record!~

is there a way to make it not put an extra blank line?

can i do this>?? with open('/pythonwork/thefile_subset11.csv', 'w'),lineterminator='\n' as outfile:

+2  A: 

Note: It seems this is not the preferred solution because of how the extra line was being added on a windows system. As stated in the python documement:

If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.

Windows is one such platform where that makes a difference. While changing the line terminator as I described below may have fixed the problem, the problem could be avoided altogether by opening the file in binary mode. One might say this solution is more "elegent". "Fiddling" with the line terminator would have likely resulted in unportable code between systems in this case, where opening a file in binary mode on a unix system results in no effect. ie. it results in cross system compatible code.

From Python Docs:

On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.

Original:

As part of optional paramaters for the csv.writer if you are getting extra blank lines you may have to change the lineterminator (info here). Example below adapated from the python page csv docs. Change it from '\n' to whatever it should be. As this is just a stab in the dark at the problem this may or may not work, but it's my best guess.

>>> import csv
>>> spamWriter = csv.writer(open('eggs.csv', 'w'), lineterminator='\n')
>>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
>>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Derek Litz
I was about to post about this -- lineterminator='\n' worked for me in a simple test.
Dan Breslau
can i do this>?? with open('/pythonwork/thefile_subset11.csv', 'w'),lineterminator='\n' as outfile:
I__
@I__ : You *really* should start perusing the Python docs. Derek gave you the link : http://docs.python.org/library/csv.html
Dan Breslau
-1 "stab in dark" == "wrong"
John Machin
dan breslau , that name is breslov originally right? your ancestors are from where belarussia or ukraine?
I__
@I__ : Off-topic comments aren't welcomed here, it seems. But feel free to email me at dbreslau atsign geemail dot com (remove the 'e's from "geemail")
Dan Breslau
+3  A: 

Open outfile as 'wb' instead of 'w'. The csv.writer writes \r\n into the file directly. If you don't open the file in binary mode, it will write \r\r\n because text mode will turn the \n into \r\n.

Mark Tolonen
you're the man thanks so much!
I__
+3  A: 

The simple answer is that csv files should always be opened in binary mode whether for input or output, as otherwise on Windows there are problems with the line ending. Specifically on output the csv module will write \r\n (the standard CSV row terminator) and then (in text mode) the runtime will replace the \n by \r\n (the Windows standard line terminator) giving a result of \r\r\n.

Fiddling with the lineterminator is NOT the solution.

John Machin
What is this CSV "standard" of which you speak?
Dan Breslau
@Dan: I used "standard" as an adjective, not a noun, meaning "usual" or "commonplace". If you want an approximation to a (noun) standard, read http://tools.ietf.org/html/rfc4180
John Machin
@John Machin: Point is (as you imply) that there is no standard. That RFE is Informational. While \r\n may be "standard" on Windows, I'm sure Unix applications typically don't see it that way.
Dan Breslau
@Dan: That is correct -- there is no standard. Scripts should specify the lineterminator [should have been named ROWterminator] that they want (if not the default) and still use binary mode in case the script is run on Windows otherwise the "lineterminator" may be stuffed up.
John Machin