I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type?
For example:
I get back: & #x01ce; (There is no space. I put that so Markdown won't interpret it) which represents an "a" with a tone mark. In binary, this is represented as the 16 bit 01ce. I want to convert the html entity into the value u'\u01ce'