ansaurus

Question

Python socket data returns <byte> object. How to regexp it?

Answer 1

+3 A:

You need to include an encoding when converting to a string, for example use:

>>> str(b'GET http://...', 'UTF-8')
'GET http://...'

If you don't use an encoding then as you've discovered you get something a little less helpful:

>>> str(b'GET http://...')
"b'GET http://...'"

Scott Griffiths 2010-02-26 14:01:37

That seems to work. Can I assume 'UTF-8' default encoding for HTTP requests?

Enrico Carlesso 2010-02-26 14:03:59

I don't think you can assume UTF-8, I think it can indicate other charsets (I'm no HTTP expert though).

Scott Griffiths 2010-02-26 14:13:00

According to the standard, any non-ASCII characters in an HTTP header are ISO-8859-1. In practice, browsers differ. Firefox uses the low-byte of the UTF-16 code unit, Opera and Chrome use UTF-8, Safari generally breaks, and IE will use the system default code page of the machine it's installed on (which will never be UTF-8). In summary, unencoded non-ASCII characters in headers are totally unreliable. Probably you don't care though, in which can you can just plump for ISO-8859-1.

bobince 2010-02-26 14:48:39

Answer 2

+1 A:

Also, you might want to check the *HTTPServer classes. They provide a wrapper around being HTTP servers and will also parse headers for you.

If you can't, well, at the very least they will provide source code examples on how to do it!

Daren Thomas 2010-02-26 14:24:00

Yes, I've noticed it, and I've got some plans to use it in the future, but now i do not need it.

Enrico Carlesso 2010-02-26 14:26:29

ansaurus

tags:

views:

answers:

Python socket data returns <byte> object. How to regexp it?

related questions