In another thread I got convinced into using HTML parsers instead of regexps for HTML parsing. I thought of using libxml (it has some HTML parser built in), but failed to find any useful tutorial. I also found this site and it says here it should do fine even with severely broken HTML.
Could you give me some examples of HTML parsing with libxml, or maybe recommend some different free library for Linux? I'm using C++.
I just thought someone would have some example code, so that I don't have to analyze the headers. ;)