views:

134

answers:

1

hi,

i am trying to form-post a sql file that consists on many INSERTS, eg.

INSERT INTO `TABLE` VALUES ('abcdé', 2759);

then i use re.search to parse it and extract the fields to put into my own datastore. The problem is that, although the file contains accented characters (see the e is a é), once uploaded it loses it and either errors or stores a bytestring representation of it.

Heres what i am currently using (and I have tried loads of alternatives):

form = cgi.FieldStorage()
uFile = form['sql']
uSql = uFile.file.read()
lineX = uSql.split("\n") # to get each line

and so on.

has anyone got a robust way of making this work? remember i am on appengine so access to some libraries is restricted/forbidden

A: 

You mention utf8 in the Q's title but then never again: what are you doing (in terms of setting headers and checking them) to verify what encoding is in use? There should be headers of the form

Content-Type: text/plain; charset=utf-8

and the charset= part is where the encoding is specified. So what are the values upon sending and receiving this? If charset is erroneous, you may have to manually perform some encoding and decoding. To help us gauge what the encoding seems to be, besides the headers, what's the ord value of that accented-e? E.g., if the encoding was actually iso-8859-1, that ord value would be 233 (in decimal; 0xE9 in hex).

Alex Martelli
hmm. your analysis is detailed but too much for my level of understanding. i mentioned utf8 because all the tests i have tried up to now seem to include reference to it or a related function. i have printed the headers you mention but the browser still renders the black diamond character.i have set the form submit page with meta headers for charset and, to be fair, the ultimate aim is to send the data in the datastore, not to the browser.i would appreciate more low-level instructions please.
khany
@khany, so why don't you **show** us the crucial information you now "have printed" but are still keeping to yourself?! The "low-level instructions" you crave as: show us **all the details** -- your code, the headers, the ord value for that accented-e character -- **don't** assume we can just diagnose your blessed code's bugs without ANY real information from you!!!
Alex Martelli