I'm trying to extract some data from various HTML pages using a python program. Unfortunately, some of these pages contain user-entered data which occasionally has "slight" errors - namely tag mismatching.
Is there a good way to have python's xml.dom try to correct errors or something of the sort? Alternatively, is there a better way to extract data from HTML pages which may contain errors?