tags:

views:

993

answers:

2

How would I go about parsing all of the "a" html tags "href" properties on a page full of BAD html, in Qt?

+5  A: 

I would use the buildin QtWebKit. Don't know how it does in terms of performance, but I think it should catch all "bad" HTML. Something like:

QWebView* view = new QWebView(parent);
view.load(QUrl("http://www.example.com"));
QWebElementCollection elements = view.page().mainFrame().findAllElements("a");

and then do whatever you like with the collection ;) (not tested code!)

Jaro
I cleaned this up and it didn't work... do I have to wait for the page to load or something?
Joshua
@JOSHUA: I'd recommend waiting until you get the loadFinished(bool) signal, yes. (http://doc.trolltech.com/4.6/qwebview.html#loadFinished)
Bill