I need to browse the DOM tree of a parsed HTML document.
I'm using uTidyLib before parsing the string with lxml
a = tidy.parseString(html_code, options) dom = etree.fromstring(str(a))
sometimes I get an error, it seems that tidylib is not able to repair malformed html.
how can I parse every HTML file without getting an error (parsing only some parts of files that can not be repaired)?