views:

97

answers:

3

Hello,

Using Python I need to transfer non utf-8 encoded data (specifically shift-jis) to a URL via the query string. How should I transfer the data? Quote it? Encode in utf-8?

Thanks

+1  A: 

I don't know what unicode has to do with this, since the query string is a string of bytes. You can use the quoting functions in urllib to quote plain strings so that they can be passed within query strings.

Tuure Laurinolli
A: 

By the »query string« you mean HTTP GET like in http:/{URL}?data=XYZ?

You have encoding what ever data you have via base64.b64encode using -_ as alternative character to be URL safe as an option. See here.

mkluwe
yeah - the GET parameters
Plumo
Base64encode? You should URLEncode it! http://en.wikipedia.org/wiki/Percent-encoding
BalusC
A matter of taste, isn't it? For arbitrary data I'd stay with Base64. For text content it's more obfuscating but not more confusing than my wrongly edited answer above, which I'm now reading again…
mkluwe
+4  A: 

Query string parameters are byte-based. Whilst IRI-to-URI and typed non-ASCII characters will typically use UTF-8, there is nothing forcing you to send or receive your own parameters in that encoding.

So for Shift-JIS (actually typically cp932, the Windows extension of that encoding):

foo= u'\u65E5\u672C\u8A9E' # 日本語
url= 'http://www.example.jp/something?foo='+urllib.quote(foo.encode('cp932'))

In Python 3 you do it in the quote function itself:

foo= '\u65E5\u672C\u8A9E'
url= 'http://www.example.jp/something?foo='+urllib.parse.quote(foo, encoding= 'cp932')
bobince