Hi,
I have a string which might include br or span.../span tags or other HTML characters/entities. I want a robust way of stripping all that and getting the remaining UTF-8 characters. This be should be cross-platform, ideally.
Something like this would be ideal:
http://snipplr.com/view/15261/python-decode-and-strip-html-entites-to-unicode/
but that also removes the tags.
Thanks!