views:

125

answers:

2

Hello,

I'm writing a small Bittorrent tracker on top of the Django framework, as part of a larger project. However, I'm having problems with decoding the "info_hash" parameter of the announce request.

Basically, uTorrent takes the SHA1 hash of the torrent in question and URL encodes the hex representation of it, which is then sent to the tracker in a GET request as the info_hash parameter.

The info_hash

A44B44B0EE8D85A9F7135489D522A19DA2C87C91

gets encoded as:

%a4KD%b0%ee%8d%85%a9%f7%13T%89%d5%22%a1%9d%a2%c8%7c%91

However, Django decodes this to the Unicode string:

u'\ufffdKD\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\x13T\ufffd\ufffd"\ufffd\ufffd\ufffd\ufffd|\ufffd'

instead of a string literal like this:

\xa4KD\xb0\xee\x8d\x85\xa9\xf7\x13T\x89\xd5"\xa1\x9d\xa2\xc8|\x91

How can I stop Django from trying to translate the info_hash to Unicode, so I can then unquote it? My goal is to get a string literal that I can then encode to a hex string.

Any thoughts? Apologies if there's some concept about encoding that I'm missing. Thanks!

A: 

Django decodes all GET data using the default encoding. You'll need to get the query string yourself, possibly from os.environ['QUERY_STRING'] or request.environ['QUERY_STRING'].

Ignacio Vazquez-Abrams
Thanks, I wasn't sure if there was a more elegant way to do it.
Alex Kloss
+1  A: 

What is your settings.DEFAULT_ENCODING? Also how deoes the hash look like in HTTP headers? It shouldn't be modified at all during encoding as below:

>>> import urllib
>>> urllib.urlencode({'hash':"A44B44B0EE8D85A9F7135489D522A19DA2C87C91"})
'hash=A44B44B0EE8D85A9F7135489D522A19DA2C87C91'

Since:

>>> urllib.quote('A44B44B0EE8D85A9F7135489D522A19DA2C87C91') == 'A44B44B0EE8D85A9F7135489D522A19DA2C87C91'
True

And therefore:

>>> urllib.unquote('%a4KD%b0%ee%8d%85%a9%f7%13T%89%d5%22%a1%9d%a2%c8%7c%91') == 'A44B44B0EE8D85A9F7135489D522A19DA2C87C91'
False
muhuk
Good answer, good times!
jathanism
That particular hash gets encoded by uTorrent to be:"%a4KD%b0%ee%8d%85%a9%f7%13T%89%d5%22%a1%9d%a2%c8%7c%91"I don't understand why it's encoded like this, but the only way to decode it (that I've found) is manually via the query string.
Alex Kloss