views:

300

answers:

1
$page = $curl->post($baseUrl.'/submit.php', array('url'=>$address,'phase'=>'1','randkey'=>$randKey[0],'id'=>'c_1'));
$exp = explode('recaptcha_image',$page);

The id recaptcha_image is not found although if i echo $page; the webpage will be displayed and surprisingly even the recpatcha div (with the image itself). Curl shouldn't load the image for recaptcha but somehow it does though when i try to find the div, it is not there. Is there a way to capture the url of the recaptcha image?

A: 

You'll want to use an HTML parser like this PHP Simple HTML DOM Parser. Something like this will work then:

<?php
$page = $curl->post($baseUrl.'/submit.php', array('url'=>$address,'phase'=>'1','randkey'=>$randKey[0],'id'=>'c_1'));
$html->load($page);
$ret = $html->find('script[src^=http://api.recaptcha.net/]',0);
$src = $ret->src;
//I'm not sure how you get an url with your library, so this might or might not work
$page = $curl->get($src);
preg_match("%challenge\ :\ '([a-zA-Z0-9-_]*)',%", $page, $matches);
$img = "http://api.recaptcha.net/image?c=".$matches[1];
?>

This first fetches the page, parses it for the script URL, then opens that URL for the challenge which is then appended to the URL itself. The image will be in the $img variable.

Arda Xi
the problem is not in finding the correct div but in parsing the javascript with curl. Since recaptcha uses js to load, curl will get the html but not the image that is generated by the js. The image will be displayed only in the browser. I hope you understand what i am trying to explain.
I get it, edited answer to reflect this.
Arda Xi
thank you, i marked it as an accepted answer because i pretty much did the same thing. Curl doesn't parse javascript so i had to make another request to api.recaptcha.net using the same cookie so that i could get the same image. In the end i managed to get the URL and the image. Thank you! I don't know who marked your answer with a negative feedback but it was lame to do that. Thank you for the help!
You don't have to use the cookie though. If you simply fetch the challenge I'm sure it'll be more precise. That is, unless that cookie contains the challenge, in which case it's pretty much the same thing.
Arda Xi
well here is the catch...the challenge isn't in $page (the result from curl). Only the key. So step 1: access the URL with the unique key using GET. It will return a js code with the image location(again js code). Access that URL using POST (and the same cookie because we need to get the exact same image for the form we already have in $page) and from this point onward things are pretty simple. Until now i have the URL + challenge key and the jpeg saved on my harddrive.The challenge key is not there if you get the page with curl. Try it, you'll see that curl doesn't understand js code at all
Well, that's what I've included this line for:`preg_match("%challenge\ :\ '([a-zA-Z0-9-_]*)',%", $page, $matches)`It parses the JS code and extracts the challenge. Then, "http://api.recaptcha.net/image?c=" is prepended to it, to form the URL.
Arda Xi