views:

151

answers:

4

I'm playing around with Win32::IE:Mechanize to try to access some authentication-required sites automatically. So far I've achieved moderate success, for example, I can automatically log in to my yahoo mailbox. But I find many sites are using some kind of image verification mechanism, which is possibly called CAPTCHA. I can do nothing to them. But one of the sites I'm trying to auto access is using a plain-text verification code. It is comnposed of four digits, selectable and copyable. But they're not in the source file which can be fetched using

$mech->content;

I searched for the keyword that appears on the webpage but not in the source file through all the files in the Temporary Internet Files but still can't find it.

Any idea what's going on? I was suspecting that the verification code was somehow hidden in some cookie file but I can't seem to find it :(

The following is the code that completes all the fields requirements except for the verification code:

use warnings;
use Win32::IE::Mechanize;

my $url = "http://www.zjsmap.com/smap/smap_login.jsp";
my $eccode = "myeccode";
my $username = "myaccountname";
my $password = "mypassword";
my $verify = "I can't figure out how to let the script get the code yet"

my $mech = Win32::IE::Mechanize->new(visible=>1);
$mech->get($url);
sleep(1); #avoids undefined value error
$mech->form_name("BaseForm");
$mech->field(ECCODE => $eccode);
$mech->field(MEMBERACCOUNT => $username);
$mech->field(PASSWORD => $password);
$mech->field(verify => $verify);
$mech->click();

Like always any suggestions/comments would be greatly appreciated :)

UPDATE

I've figured out a not-so-smart way to solve this problem. Please comment on my own asnwer posted below. Thanks like always :)

+3  A: 

This is the reason why they are there. To stop program like yours to do automated stuff ;-)

A CAPTCHA or Captcha is a type of challenge-response test used in computing to ensure that the response is not generated by a computer.

Shoban
@Shoban, the very site I'm talking about is probably not using CAPTCHA, because the verification code there is selectable and copyable digits. That's why I'm hoping to find a way to deal with it.
Mike
Captcha is a generic term covering all measures that prevent website from being used by a robot. The thing you’re talking about is considered captcha.
zoul
@zoul, according to Wiki, "a captcha is a means of automatically generating new challenges which current software is unable to solve accurately", but I'm pretty sure the problem I'm facing is far from being that hard.
Mike
There are good captchas, and there are bad captchas. The thing you are facing *is* a captcha, please trust me :) It does not seem to be very strong, but I’m not going through that JS spaghetti code right in the morning.
zoul
A: 

The code is inserted by JavaScript – disable JS, reload the page and see it disappear. You have to hunt through the JS code to get an idea where it comes from and how to replicate it.

zoul
@zoul, yes, I'm actually trying to do the thing in this direction. But I'm kind of stuck.
Mike
@zoul, there seem to be 3 places in the source file relevant to the problem. But I still can't figure out how it actually works. the actual verification is stored in memory maybe? But then how to replicate it? Anyway, here's what I see in the source file. The line "var random_number = rand(1000,10000);" is to generate the random number. The line "document.write(random_number);" is to display it and "if (document.BaseForm.verify.value != random_number) {...); return false;}" is to validate the code.
Mike
+2  A: 

This appears to be an irrelevant number. The page uses it in 3 places: generating it; displaying it on the form next to the input field for it; and checking for the input value being equal to the random number chosen. That is, it is a client-only check. Still, if you disable javascript it looks like, I'm guessing, important cookies don't get set. If you can execute JavaScript in the context of the page (you should be able to with a get method call and a javascript URI), you could change the value of random_number to f.e. 42 and fill that in on the form.

MkV
@james2vegas, thanks for the pointer. I'll see what I can do. I know nothing about Javascript but yes there seem to be 3 places in the source file relevant to the problem. the line "var random_number = rand(1000,10000);" is to generate the random number. the line "document.write(random_number);" is to display it and "if (document.BaseForm.verify.value != random_number) {...); return false;}" is to validate the code.
Mike
A: 

Thanks to james2vegas, zoul and Shoban.

I've finally figured out on my own a not-so-smart but at-least-workable way to solve the problem I described here. I'd like to share it here. I think the approach suggested by @james2vegas is probably much better...but anyway I'm learning along the way.

My approach is this:

Although the verification code is not in the source file but since it is still selectable and copyable, I can let my script copy everything in the login page and then extract the verification code.

To do this, I use the sendkeys functions in the Win32::Guitest module to do "Select All" and "Copy" to the login page.

Then I use Win32:Clipboard to get the clipboard content and then Regexp to extract the code. Something like this:

$verify = Win32::Clipboard::GetText();
$verify =~ s/.* (\d{4}).*/$1/msg;

A few thoughts:

The random number is generated by something like this in Perl my $random_number = int(rand(8999)) + 1000; #var random_number = rand(1000,10000); And then it checks if $verify == $random_number. I don't know how to catch the value of one-session-only $random_number. I think it is stored somewhere in the memory. If I can capture the value directly then I wouldn't have gone to so much trouble of using this and that extra module.

Mike