views:

348

answers:

2

Hi,

I'm wondering if anyone with a better understanding of python and gae can help me with this. I am uploading a csv file from a form to the gae datastore.

class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.get('csv_import')
     fileReader = csv.reader(csv_file)
     for row in fileReader:       
       self.response.out.write(row) 

I'm running into the same problem that someone else mentions here - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717

That is, the csv.reader is iterating over each character and not the line. A google engineer left this explanation:

The call self.request.get('csv') returns a String. When you iterate over a string, you iterate over the characters, not the lines. You can see the difference here:

 class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 
     file = open(os.path.join(os.path.dirname(__file__), 'sample.csv')) 
     self.response.out.write(file) 

     # Iterating over a file 
     fileReader = csv.reader(file) 
     for row in fileReader: 
       self.response.out.write(row) 

     # Iterating over a string 
     fileReader = csv.reader(self.request.get('csv')) 
     for row in fileReader: 
       self.response.out.write(row) 

I really don't follow the explanation, and was unsuccessful implementing it. Can anyone provide a clearer explanation of this and a proposed fix?

Thanks, August

+1  A: 

I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.

The Python csv module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.

The webapp.RequestHandler request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two. When you invoke self.request.get('csv') this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c will print each character in the string on a separate line).

Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:

class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 

     # Iterating over a string as a file 
     stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
     for row in stringReader: 
        self.response.out.write(row) 

Which will work as you expect it to.

Edit I'm assuming that you are using something like a <textarea/> to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).

ig0774
Thanks! This solved it, and your explanation was a lot easier for me to understand.
August Flanagan
+3  A: 

Short answer, try this:

fileReader = csv.reader(csv_file.split("\n"))

Long answer, consider the following:

for thing in stuff:
  print thing.strip().split(",")

If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.

Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.

Drew Sears
Thanks for the explanation, it makes a lot more sense to me now.
August Flanagan