tags:

views:

1946

answers:

3

Hello, i'm trying to get web-page data in string that than i could parse it. I didn't found any methods in qwebview, qurl and another. Could you help me? Linux, C++, Qt.

EDIT:

Thanks for help. Code is working, but some pages after downloading have broken charset. I tried something like this to repair it:

QNetworkRequest *request = new QNetworkRequest(QUrl("http://ru.wiktionary.org/wiki/bovo"));

request->setRawHeader( "User-Agent", "Mozilla/5.0 (X11; U; Linux i686 (x86_64); "
                       "en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1" );
request->setRawHeader( "Accept-Charset", "win1251,utf-8;q=0.7,*;q=0.7" );
request->setRawHeader( "charset", "utf-8" );
request->setRawHeader( "Connection", "keep-alive" );

manager->get(*request);

Any results =(.

+9  A: 

Have you looked at QNetworkAccessManager? Here's a rough and ready sample illustrating usage:

class MyClass : public QObject
{
Q_OBJECT

public:
    MyClass();
    void fetch(); 

public slots:
    void replyFinished(QNetworkReply*);

private:
    QNetworkAccessManager* m_manager;
};


MyClass::MyClass()
{
    m_manager = new QNetworkAccessManager(this);

    connect(m_manager, SIGNAL(finished(QNetworkReply*)),
         this, SLOT(replyFinished(QNetworkReply*)));

}

void MyClass::fetch()
{
    m_manager->get(QNetworkRequest(QUrl("http://stackoverflow.com")));
}

void MyClass::replyFinished(QNetworkReply* pReply)
{

    QByteArray data=pReply->readAll();
    QString str(data);

    //process str any way you like!

}

In your in your handler for the finished signal you will be passed a QNetworkReply object, which you can read the response from as it inherits from QIODevice. A simple way to do this is just call readAll to get a QByteArray. You can construct a QString from that QByteArray and do whatever you want to do with it.

Paul Dixon
Thanks for answering. But i got an error:Object::connect: No such slot MainWindow::replyFinished(QNetworkReply*)
Ockonal
you need to add a slot to the receiving class with the signature void replyFinished(QNetworkReply*)
Idan K
Sorry, i understood. But i don't know how to read data yet. Help me, please :)
Ockonal
inside your replyFinished slot call readAll() on the QNetworkReply argument, you'll get back a QByteArray.
Idan K
I try this: manager->get(QNetworkRequest(QUrl("http:/stackoverflow.com")))->readAll().constData();It always returns empty string. Why?
Ockonal
It's no use reading from the QNetworkReply object until the QNetworkManager sends the replyFinished signal - if you're not familiar with signal and slot handling, look it up in the Qt manual.
Paul Dixon
I have expanded the code sample to illustrate
Paul Dixon
thanks your example helped me a lot .
Night Walker
This won't work in general because "QNetworkReply is a sequential-access QIODevice, which means that once data is read from the object, it no longer kept by the device. It is therefore the application's responsibility to keep this data if it needs to." Often you will find readAll() returns nothing because the content has already been read.
Plumo
The code I posted only reads it once - you'll get a fresh QNetworkReply whenever that fetch method is called.
Paul Dixon
Yes it works in isolation, but not if you try combining with a QWebView to render the webpage: http://stackoverflow.com/questions/2968482/qt-jambi-accessing-the-content-of-qnetworkreply
Plumo
Rather than "it won't work in general", is that it won't work if you're using the QNetworkAccessManager in conjunction with another class which might be trying to consume the data as it loads. That is of course correct, but my sample doesn't do that, and I believe the OP was only interested in obtaining the response to parse it, not rendering it in a QWebView. It's perfectly valid technique. However, it's been a year since I wrote much Qt code, there might be some more concise techniques in recent releases...
Paul Dixon
A: 

Have you looked into lynx, curl, or wget? In the past I have needed to grab and parse info from a website, sans db access, and if you are trying to get dynamically formatted data, I believe this would be the quickest way. I'm not a C guy, but I assume there is a way to run shell scripts and grab the data, or at least get the script running and grab the output from a file after writing to it. Worst case scenario, you could run a cron and check for a "finished" line at the end of the written file with C, but I doubt that will be necessary. I suppose it depends on what you're needing it for, but if you just want the output html of a page, something as east as a wget piped to awk or grep can work wonders.

Jesse
A: 

Paul Dixon's answer is probably the best approach but Jesse's answer does touch something worth mentioning.

cURL -- or more precisely libcURL is a wonderfully powerful library. No need for executing shell scripts and parsing output, libCURL is available C,C++ and more languages than you can shake an URL at. It might be useful if you are doing some weird operation (like http POST over ssl?) that qt doesnt support.

C-o-r-E
Can anyone confirm that Qt can't handle POST through SSL?
Andrioid