www-mechanize

How can I extract all links from the page excluding one using Perl's WWW::Mechanize?

I'm trying to use WWW::Mechanize to extract some links from the HTML page using find_all_links() method. It supports matching on these criterias: text text_regex url url_regex url_abs url_abs_regex ... How can I extract all links except one that has text "xyz"? ...

WWW::Mechanize Perl login only works after relaunch

Hello, I'm trying to login automatically in a website using Perl with WWW::Mechanize. What I do is: $bot = WWW::Mechanize->new(); $bot->cookie_jar( HTTP::Cookies->new( file => "cookies.txt", autosave => 1, ignore_discard => 1, ) ); $response = $bot->get( 'http://blah...

What is the preferred method of accessing WWW::Mechanize responses?

Hello! Are both of these versions OK or is one of them to prefer? #!/usr/bin/env perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $content; # 1 $mech->get( 'http://www.kernel.org' ); $content = $mech->content; print $content; # 2 my $res = $mech->get( 'http://www.kernel.org' ); $content = $res->...

Why does my WWW::Mechanize program time-out when it tries to login?

The site I was screen scraping (Which I have creds for) recently changed their server and blocked port 80. I thought I could just use port 443 for https but I get an timeout error now. I'm just creating a new WWW::Mechanize object and using the get() to scrape the site. My question is, do I need to add the cookie now that they use https...

Why can't Perl's WWW::Mechanize find the form by field names?

#!/usr/bin/perl use WWW::Mechanize; use Compress::Zlib; my $mech = WWW::Mechanize->new(); my $username = ""; #fill in username here my $keyword = ""; #fill in password here my $mobile = $ARGV[0]; my $text = $ARGV[1]; $deb = 1; print length($text)."\n" if($deb); $text = $text."\n\n\n\n\n" if(length($text) < 135); $mech->get("http...

Why am I getting a new session ID on every page fetch in my Perl WWW::Mechanize script?

So I'm scraping a site that I have access to via HTTPS, I can login and start the process but each time I hit a new page (URL) the cookie Session Id changes. How do I keep the logged in Cookie Session Id? #!/usr/bin/perl -w use strict; use warnings; use WWW::Mechanize; use HTTP::Cookies; use LWP::Debug qw(+); use HTTP::Request; use LWP:...

How can I get WWW-Mechanize to login to Wells Fargo's website?

I am trying to use Perl's WWW::Mechanize to login to my bank and pull transaction information. After logging in through a browser to my bank (Wells Fargo), it briefly displays a temporary web page saying something along the lines of "please wait while we verify your identity". After a few seconds it proceeds to the bank's webpage where I...

WWW::Mechanize::Plugin::Display - Always open a new window

How can I configure WWW::Mechanize::Plugin::Display, so that the plug-in always opens a new window and not only a new tab? ...

How can I download link targets from a web site using Perl?

I just made a script to grab links from a website, and in turn saves them into a text file. Now I'm working on my regexes so it will grab links which contains php?dl= in the url from the text file: E.g.: www.example.com/site/admin/a_files.php?dl=33931 Its pretty much the address you get when you hover over the dl button on the site...

Trouble with downloading files

I am trying to download a file from a site using perl. I chose not to use wget so that I can learn how to do it this way. I am not sure if my page is not connecting or if something is wrong in my syntax somewhere. Also what is the best way to check if you are getting a connection to the page. #!/usr/bin/perl -w use strict; use LWP; use ...

How can I get the contents of a followed link in WWW::Mechanize?

This is my last question for this I hope. I am using $mech->follow_link to try to download a file. For some reason though the file saved is just the page I first pull up and not the link I want to follow. Is this the correct way I should download the file from the link? I do not want to use wget. #!/usr/bin/perl -w use strict; ...

How can I get links that match a regex using WWW::Mechanize?

I'm trying to use regular expressions to catch a link, but can not. I have all the links, but there are many links that do not want. What I do is to grab all links: http://valeptr.com/scripts/runner.php?IM= To comply with this pattern. I put the script I'm doing: use warnings; use strict; use WWW::Mechanize; use WWW::Mechanize::Sleepy...

Saving a file with WWW::Selenium

Im trying to download a file with perl's WWW::Selenium. I get a popup box asking me if I want to save/open the file. I want to manipulate it and say 'save' at some given location. Im not sure how this can be done. Please help. P.S: I could not use WWW::Mechanize for this page and I have to use Selenium Thanks a lot! ...

How do I find "wide characters" printed by perl?

A perl script that scrapes static html pages from a website and writes them to individual files appears to work, but also prints many instances of wide character in print at ./script.pl line n to console: one for each page scraped. However, a brief glance at the html files generated does not reveal any obvious mistakes in the scraping. ...

Visit Half Million Pages with Perl

Currently I'm using Mechanize and the get() method to get each site, and check with content() method each mainpage for something. I have a very fast computer + 10Mbit connection, and still, it took 9 hours to check 11K sites, which is not acceptable, the problem is, the speed of the get() function, which , obviously, needs to get the pag...

Why does WWW::Mechanize GET certain pages but not others?

I'm new to Perl/HTML things. I'm trying to use $mech->get($url) to get something from a periodic table on http://en.wikipedia.org/wiki/Periodic_table but it kept returning error message like this: Error GETing http://en.wikipedia.org/wiki/Periodic_table: Forbidden at PeriodicTable.pl line 13 But $mech->get($url) works fine if ...

extract all links from a HTML page, exclude links from a specific table

Hi experts, I'm pretty new to Perl/HTML. Here is what I'm trying to do with WWW::Mechanize and HTML::TreeBuilder: For each chemical element page on Wikipedia, I need to extract all hyperlinks that point to the other chemical elements' pages on wiki and print each unique pair in this format: Atomic_Number1 (Chemical Element Title1) -> ...

Why can't WWW::Mechanize find the right form?

I'm using WWW::Mechanize to retrieve a form from a webpage: #!/usr/bin/perl use WWW::Mechanize; my $mechanize = WWW::Mechanize->new(); $mechanize->proxy(['http', 'ftp'], 'http://proxy/'); $mechanize->get("http://www.temp.com/"); $mechanize->form_id('signin'); The website HTML has code as follows <form action="https://www.temp.co...

How to do I use a web proxy with Perl's WWW::Mechanize?

I'm trying to use WWW::Mechanize with a proxy server, but seems like I cant get it to work. Since Mechanize is a subclass of LWP::UserAgent, I've been reading about the proxy thing over link text I have a list of proxies, for example: 74.87.151.157:8000 69.171.152.25:3128 190.253.82.253:8080 189.11.196.221:3128 41.234.205.201:808...

How can I use Perl's WWW::Mechanize to check if a web page has been updated?

I'm using WWW::Mechanize to retrieve a webpage. I need to check if the page has been updated and retreive information from it. How can I do this? ...