tags:

views:

426

answers:

1

The response data from HTTPResponse object is of type bytes.

conn = http.client.HTTPConnection(www.yahoo.com)
conn.request("GET","/")
response = conn.getresponse();
data = response.read()
type(data)

The data is of type bytes.

I would like to use the response along with the built-in HTML parser of Python 3.1. However I find that HTMLParser.feed() requires a string (of type str). And this method does not accept data as the argument. To circumvent this problem, I have used data.decode() to continue with the parsing.

Question:

  1. Is there a better way to accomplish this?
  2. Is there a reason why HTTP response does not return string?

I guess the reason is this: The response of the server could be in any character set. So, the library cannot assume that it would be ASCII. But then, string in python is Unicode. The HTTP library could as well return a string. HTML tags are definitely in ASCII.

+2  A: 

Is there a reason why HTTP response does not return string?

You nailed it yourself. A HTTP response isn't necessarily a string.

It can be an image, for example, and even when it is a string it can't know the encoding. If you know the encoding (or have an encoding detection library) then it's very easy to convert a series of bytes to a string. In fact, the byte type is often used synonymously with the char type in C-based languages.

HTML tags are definitely in ASCII.

And if HTML tags were always ASCII, XHTML (which is recommended to be delivered as UTF-8) would have serious issues!

Besides, HTTP != HTML.

Rushyo