The response data from HTTPResponse object is of type bytes.
conn = http.client.HTTPConnection(www.yahoo.com)
conn.request("GET","/")
response = conn.getresponse();
data = response.read()
type(data)
The data is of type bytes.
I would like to use the response along with the built-in HTML parser of Python 3.1. However I find that HTMLParser.feed() requires a string (of type str). And this method does not accept data as the argument. To circumvent this problem, I have used data.decode() to continue with the parsing.
Question:
- Is there a better way to accomplish this?
- Is there a reason why HTTP response does not return string?
I guess the reason is this: The response of the server could be in any character set. So, the library cannot assume that it would be ASCII. But then, string in python is Unicode. The HTTP library could as well return a string. HTML tags are definitely in ASCII.