I'm trying to use WWW::Mechanize to extract some links from the HTML page using find_all_links() method. It supports matching on these criterias:
text
text_regex
url
url_regex
url_abs
url_abs_regex
...
How can I extract all links except one that has text "xyz"?
...
Hello,
I'm trying to login automatically in a website using Perl with WWW::Mechanize.
What I do is:
$bot = WWW::Mechanize->new();
$bot->cookie_jar(
HTTP::Cookies->new(
file => "cookies.txt",
autosave => 1,
ignore_discard => 1,
)
);
$response = $bot->get( 'http://blah...
Hello! Are both of these versions OK or is one of them to prefer?
#!/usr/bin/env perl
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
my $content;
# 1
$mech->get( 'http://www.kernel.org' );
$content = $mech->content;
print $content;
# 2
my $res = $mech->get( 'http://www.kernel.org' );
$content = $res->...
The site I was screen scraping (Which I have creds for) recently changed their server and blocked port 80. I thought I could just use port 443 for https but I get an timeout error now. I'm just creating a new WWW::Mechanize object and using the get() to scrape the site.
My question is, do I need to add the cookie now that they use https...
#!/usr/bin/perl
use WWW::Mechanize;
use Compress::Zlib;
my $mech = WWW::Mechanize->new();
my $username = ""; #fill in username here
my $keyword = ""; #fill in password here
my $mobile = $ARGV[0];
my $text = $ARGV[1];
$deb = 1;
print length($text)."\n" if($deb);
$text = $text."\n\n\n\n\n" if(length($text) < 135);
$mech->get("http...
So I'm scraping a site that I have access to via HTTPS, I can login and start the process but each time I hit a new page (URL) the cookie Session Id changes. How do I keep the logged in Cookie Session Id?
#!/usr/bin/perl -w
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
use LWP::Debug qw(+);
use HTTP::Request;
use LWP:...
I am trying to use Perl's WWW::Mechanize to login to my bank and pull transaction information. After logging in through a browser to my bank (Wells Fargo), it briefly displays a temporary web page saying something along the lines of "please wait while we verify your identity". After a few seconds it proceeds to the bank's webpage where I...
How can I configure WWW::Mechanize::Plugin::Display, so that the plug-in always opens a new window and not only a new tab?
...
I just made a script to grab links from a website, and in turn saves them into a text file.
Now I'm working on my regexes so it will grab links which contains php?dl= in the url from the text file:
E.g.: www.example.com/site/admin/a_files.php?dl=33931
Its pretty much the address you get when you hover over the dl button on the site...
I am trying to download a file from a site using perl. I chose not to use wget so that I can learn how to do it this way. I am not sure if my page is not connecting or if something is wrong in my syntax somewhere. Also what is the best way to check if you are getting a connection to the page.
#!/usr/bin/perl -w
use strict;
use LWP;
use ...
This is my last question for this I hope. I am using $mech->follow_link to try to download a file. For some reason though the file saved is just the page I first pull up and not the link I want to follow. Is this the correct way I should download the file from the link? I do not want to use wget.
#!/usr/bin/perl -w
use strict;
...
I'm trying to use regular expressions to catch a link, but can not.
I have all the links, but there are many links that do not want.
What I do is to grab all links:
http://valeptr.com/scripts/runner.php?IM=
To comply with this pattern.
I put the script I'm doing:
use warnings;
use strict;
use WWW::Mechanize;
use WWW::Mechanize::Sleepy...
Im trying to download a file with perl's WWW::Selenium. I get a popup box asking me if I want to save/open the file. I want to manipulate it and say 'save' at some given location. Im not sure how this can be done. Please help.
P.S: I could not use WWW::Mechanize for this page and I have to use Selenium
Thanks a lot!
...
A perl script that scrapes static html pages from a website and writes them to individual files appears to work, but also prints many instances of wide character in print at ./script.pl line n to console: one for each page scraped.
However, a brief glance at the html files generated does not reveal any obvious mistakes in the scraping. ...
Currently I'm using Mechanize and the get() method to get each site, and check with content() method each mainpage for something.
I have a very fast computer + 10Mbit connection, and still, it took 9 hours to check 11K sites, which is not acceptable, the problem is, the speed of the get() function, which , obviously, needs to get the pag...
I'm new to Perl/HTML things. I'm trying to use $mech->get($url) to get something from a periodic table on http://en.wikipedia.org/wiki/Periodic_table but it kept returning error message like this:
Error GETing
http://en.wikipedia.org/wiki/Periodic_table:
Forbidden at PeriodicTable.pl line 13
But $mech->get($url) works fine if ...
Hi experts,
I'm pretty new to Perl/HTML. Here is what I'm trying to do with WWW::Mechanize and HTML::TreeBuilder:
For each chemical element page on Wikipedia, I need to extract all hyperlinks that point to the other chemical elements' pages on wiki and print each unique pair in this format:
Atomic_Number1 (Chemical Element Title1) -> ...
I'm using WWW::Mechanize to retrieve a form from a webpage:
#!/usr/bin/perl
use WWW::Mechanize;
my $mechanize = WWW::Mechanize->new();
$mechanize->proxy(['http', 'ftp'], 'http://proxy/');
$mechanize->get("http://www.temp.com/");
$mechanize->form_id('signin');
The website HTML has code as follows
<form action="https://www.temp.co...
I'm trying to use WWW::Mechanize with a proxy server, but seems like I cant get it to work.
Since Mechanize is a subclass of LWP::UserAgent, I've been reading about the proxy thing over
link text
I have a list of proxies, for example:
74.87.151.157:8000
69.171.152.25:3128
190.253.82.253:8080
189.11.196.221:3128
41.234.205.201:808...
I'm using WWW::Mechanize to retrieve a webpage. I need to check if the page has been updated and retreive information from it. How can I do this?
...