views:

180

answers:

1

The site I was screen scraping (Which I have creds for) recently changed their server and blocked port 80. I thought I could just use port 443 for https but I get an timeout error now. I'm just creating a new WWW::Mechanize object and using the get() to scrape the site.

My question is, do I need to add the cookie now that they use https?

Is this the correct way to add the cookie jar?

my $agent = WWW::Mechanize->new();

$agent->agent('Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100407 Ubuntu/9.10 (karmic) Firefox/3.6.3');

# we need cookies
$agent->cookie_jar(HTTP::Cookies->new);

This is the error:

Trying to log in... 2010-04-22
14:00:08 Error POSTing
https://theURL/j_security_check:
The time allowed for the login process has been exceeded. If you wish to continue you must either click back twice and re-click the link you requested or close and re-open your browser at lib/mypackage.pm line 40

Is this even a cookie issue?

Is there a way to increase the login time, even if I log into the site through a browser it feels like it takes a good 60 to 90 seconds before I log in.

+3  A: 

WWW::Mechanize is built on top of LWP::UserAgent, so you can use the LWP::UserAgent methods. The default timeout is 180 seconds, which is already extremely long, but you can change it to be any value that you like by using the timeout method:

 $mech->timeout( $really_long_value );

This timeout is not the total request time, but the idle time on the socket that the user-agent will tolerate. If it receives no interaction within that time, the request should fail.

However, it sounds like the server probably has a problem since it takes so long even when you do it manually. You might mention this to whoever runs that site. That error message is very suspicious. Without more details about the server, etc, it's very difficult to tell you what's going on.

As for the cookie issue, just watch the HTTP conversation when you try it manually. Do whatever your interactive browser does. If it sends cookies, do that. If it uses a different form of authentication, do that, and so on. They might have changed more than the scheme when they turned off port 80.

brian d foy