How would I go about parsing all of the "a" html tags "href" properties on a page full of BAD html, in Qt?
+5
A:
I would use the buildin QtWebKit. Don't know how it does in terms of performance, but I think it should catch all "bad" HTML. Something like:
QWebView* view = new QWebView(parent);
view.load(QUrl("http://www.example.com"));
QWebElementCollection elements = view.page().mainFrame().findAllElements("a");
and then do whatever you like with the collection ;) (not tested code!)
Jaro
2010-02-01 19:51:09
I cleaned this up and it didn't work... do I have to wait for the page to load or something?
Joshua
2010-02-01 21:10:59
@JOSHUA: I'd recommend waiting until you get the loadFinished(bool) signal, yes. (http://doc.trolltech.com/4.6/qwebview.html#loadFinished)
Bill
2010-02-01 21:31:55