questions about www-mechanize | ansaurus

www-mechanize

How can I extract all links from the page excluding one using Perl's WWW::Mechanize?

I'm trying to use WWW::Mechanize to extract some links from the HTML page using find_all_links() method. It supports matching on these criterias: text text_regex url url_regex url_abs url_abs_regex ... How can I extract all links except one that has text "xyz"? ...

WWW::Mechanize Perl login only works after relaunch

Hello, I'm trying to login automatically in a website using Perl with WWW::Mechanize. What I do is: $bot = WWW::Mechanize->new(); $bot->cookie_jar( HTTP::Cookies->new( file => "cookies.txt", autosave => 1, ignore_discard => 1, ) ); $response = $bot->get( 'http://blah...

session-cookies

What is the preferred method of accessing WWW::Mechanize responses?

Hello! Are both of these versions OK or is one of them to prefer? #!/usr/bin/env perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $content; # 1 $mech->get( 'http://www.kernel.org' ); $content = $mech->content; print $content; # 2 my $res = $mech->get( 'http://www.kernel.org' ); $content = $res->...

Why does my WWW::Mechanize program time-out when it tries to login?

The site I was screen scraping (Which I have creds for) recently changed their server and blocked port 80. I thought I could just use port 443 for https but I get an timeout error now. I'm just creating a new WWW::Mechanize object and using the get() to scrape the site. My question is, do I need to add the cookie now that they use https...

Why can't Perl's WWW::Mechanize find the form by field names?

#!/usr/bin/perl use WWW::Mechanize; use Compress::Zlib; my $mech = WWW::Mechanize->new(); my $username = ""; #fill in username here my $keyword = ""; #fill in password here my $mobile = $ARGV[0]; my $text = $ARGV[1]; $deb = 1; print length($text)."\n" if($deb); $text = $text."\n\n\n\n\n" if(length($text) < 135); $mech->get("http...

Why am I getting a new session ID on every page fetch in my Perl WWW::Mechanize script?

So I'm scraping a site that I have access to via HTTPS, I can login and start the process but each time I hit a new page (URL) the cookie Session Id changes. How do I keep the logged in Cookie Session Id? #!/usr/bin/perl -w use strict; use warnings; use WWW::Mechanize; use HTTP::Cookies; use LWP::Debug qw(+); use HTTP::Request; use LWP:...

screen-scraping

How can I get WWW-Mechanize to login to Wells Fargo's website?

I am trying to use Perl's WWW::Mechanize to login to my bank and pull transaction information. After logging in through a browser to my bank (Wells Fargo), it briefly displays a temporary web page saying something along the lines of "please wait while we verify your identity". After a few seconds it proceeds to the bank's webpage where I...

WWW::Mechanize::Plugin::Display - Always open a new window

How can I configure WWW::Mechanize::Plugin::Display, so that the plug-in always opens a new window and not only a new tab? ...

How can I download link targets from a web site using Perl?

I just made a script to grab links from a website, and in turn saves them into a text file. Now I'm working on my regexes so it will grab links which contains php?dl= in the url from the text file: E.g.: www.example.com/site/admin/a_files.php?dl=33931 Its pretty much the address you get when you hover over the dl button on the site...

Trouble with downloading files

I am trying to download a file from a site using perl. I chose not to use wget so that I can learn how to do it this way. I am not sure if my page is not connecting or if something is wrong in my syntax somewhere. Also what is the best way to check if you are getting a connection to the page. #!/usr/bin/perl -w use strict; use LWP; use ...

How can I get the contents of a followed link in WWW::Mechanize?

This is my last question for this I hope. I am using $mech->follow_link to try to download a file. For some reason though the file saved is just the page I first pull up and not the link I want to follow. Is this the correct way I should download the file from the link? I do not want to use wget. #!/usr/bin/perl -w use strict; ...

How can I get links that match a regex using WWW::Mechanize?

I'm trying to use regular expressions to catch a link, but can not. I have all the links, but there are many links that do not want. What I do is to grab all links: http://valeptr.com/scripts/runner.php?IM= To comply with this pattern. I put the script I'm doing: use warnings; use strict; use WWW::Mechanize; use WWW::Mechanize::Sleepy...

Saving a file with WWW::Selenium

Im trying to download a file with perl's WWW::Selenium. I get a popup box asking me if I want to save/open the file. I want to manipulate it and say 'save' at some given location. Im not sure how this can be done. Please help. P.S: I could not use WWW::Mechanize for this page and I have to use Selenium Thanks a lot! ...

How do I find "wide characters" printed by perl?

A perl script that scrapes static html pages from a website and writes them to individual files appears to work, but also prints many instances of wide character in print at ./script.pl line n to console: one for each page scraped. However, a brief glance at the html files generated does not reveal any obvious mistakes in the scraping. ...

screen-scraping

Visit Half Million Pages with Perl

Currently I'm using Mechanize and the get() method to get each site, and check with content() method each mainpage for something. I have a very fast computer + 10Mbit connection, and still, it took 9 hours to check 11K sites, which is not acceptable, the problem is, the speed of the get() function, which , obviously, needs to get the pag...

Why does WWW::Mechanize GET certain pages but not others?

I'm new to Perl/HTML things. I'm trying to use $mech->get($url) to get something from a periodic table on http://en.wikipedia.org/wiki/Periodic_table but it kept returning error message like this: Error GETing http://en.wikipedia.org/wiki/Periodic_table: Forbidden at PeriodicTable.pl line 13 But $mech->get($url) works fine if ...

extract all links from a HTML page, exclude links from a specific table

Hi experts, I'm pretty new to Perl/HTML. Here is what I'm trying to do with WWW::Mechanize and HTML::TreeBuilder: For each chemical element page on Wikipedia, I need to extract all hyperlinks that point to the other chemical elements' pages on wiki and print each unique pair in this format: Atomic_Number1 (Chemical Element Title1) -> ...

Why can't WWW::Mechanize find the right form?

I'm using WWW::Mechanize to retrieve a form from a webpage: #!/usr/bin/perl use WWW::Mechanize; my $mechanize = WWW::Mechanize->new(); $mechanize->proxy(['http', 'ftp'], 'http://proxy/'); $mechanize->get("http://www.temp.com/"); $mechanize->form_id('signin'); The website HTML has code as follows <form action="https://www.temp.co...

How to do I use a web proxy with Perl's WWW::Mechanize?

I'm trying to use WWW::Mechanize with a proxy server, but seems like I cant get it to work. Since Mechanize is a subclass of LWP::UserAgent, I've been reading about the proxy thing over link text I have a list of proxies, for example: 74.87.151.157:8000 69.171.152.25:3128 190.253.82.253:8080 189.11.196.221:3128 41.234.205.201:808...

How can I use Perl's WWW::Mechanize to check if a web page has been updated?

I'm using WWW::Mechanize to retrieve a webpage. I need to check if the page has been updated and retreive information from it. How can I do this? ...

1
2
3