I would like to be able to parse RSS and Atom feeds that contain
non-valid XML. The errors I have encountered and would like to fix
include "simple" things such as a >
where the closing ;
is
missing, missing closing tags and closing tags that appear in the
wrong order.
I would like to ignore the question whether in theory it makes any sense to attempt parsing malformed XML documents at all. One "technical" term that seems to come rather close to what I want to do is "tag soup". What existing CPAN modules should I use to build such a parser that is able to tolerate or correct simple errors like those described above?