views:

96

answers:

2

I'm trying to use Perl's WWW::Mechanize to download a file. I have to login the website before and then, after having validated the form, download the file.

The thing is, after hours, I didn't succeed doing what I want. At the end, the script save a file which is not a zip file but a html file with nothing interesting in it.

Here is the script I've done :

use WWW::Mechanize;
use Crypt::SSLeay;

my $login = "MyMail";
my $password = "MyLogin";
my $url = 'http://www.lemonde.fr/journalelectronique/donnees/protege/20101002/Le_Monde_20101002.zip';

$bot = WWW::Mechanize->new();
$bot->cookie_jar(
    HTTP::Cookies->new(
        file           => "cookies.txt",
        autosave       => 1,
        ignore_discard => 1,
    )
);

$response = $bot->get($url);

$bot->form_name("formulaire");
$bot->field('login', $login);
$bot->field('password', $password);
$bot->submit();

$response = $bot->get($url);
my $filename = $response->filename;

if (! open ( FOUT, ">$filename" ) ) {
    die("Could not create file: $!" );
}
print( FOUT $bot->response->content() );
close( FOUT );

Could you help me finding what mistakes I've done?

A: 

Bonjour Cyril,

Je me permet de te répondre en français, car je ne pense pas que tu sois bilingue.

Le problème que tu cites est très connu de Mechanize. La solution la plus simple est d'utiliser la librairie RASPO.

Good luck et n'hésite pas à travailler ton anglais ;)


Translated by Google Translate:

Hi Cyril,

I can answer you in French because I do not think you're bilingual.

The problem you mention is known Mechanize. The simplest solution is to use the library Raspo.

Good luck and do not mind working your english;)

perlman
rooh, on a vu pire comme anglais tout de même !
Benoit
Salut. Je ne trouve rien sur ta librairie...
This library does not exist.
leo
@perlman The poster posted in English. See also http://meta.stackoverflow.com/q/13676/130608
Sinan Ünür
Posting in French would be acceptable, if the english translation were also included for the benefit of everyone else who wishes to learn from this problem.
Ether
If it's the choice between a french answer and no answer, I'll take the french answer. The wikiwonders of Stackoverflow allow you to edit the answers to provide the translation. It took me longer to type this comment than it did to translate the post for everyone. Unfortunately, the answer is still useless.
brian d foy
+2  A: 

There are some hidden input fields which I assume are filled in when you navigate to the download using a browser rather than using a URL directly.

In addition, they are setting some cookies via JavaScript and those would not be picked up by Mechanize. However, there is a plugin WWW::Mechanize::Plugin::JavaScript which might be able to help you with that (I have no experience with it).

Use LiveHTTPHeaders to see what gets submitted by the browser and replicate that (assuming you are not violating their TOS).

Sinan Ünür
Thank you very much for your answer, I'll try to find out what's going on with these fiels.
I'm not violating anything since I'm a registered user to this website ! :)
@user467954 Well, their TOS might still prohibit using bots to access content. I am not implying you are doing anything wrong. You just never know with newspapers these days.
Sinan Ünür