views:

486

answers:

3

Hi all

i am a newbie and try different things everyday and always come here when i am stuck with something.

I want to write a script using curl and php that goes to this link :http://tools.cisco.com/WWChannels/LOCATR/openBasicSearch.do and then goes through each page for each country capturing a list of every partner in every country and saving it to database.

i have no ideas how script will select countries one by one from select box and redirect page to country page...which is the very first thing to do, once we are on the page pattern matching comes in play for storing name and address in database which i can manage.

Problem is before we select any country url is::http://tools.cisco.com/WWChannels/LOCATR/BasicSearch.do and after we select country say 'india' url is:http://tools.cisco.com/WWChannels/LOCATR/performBasicSearch.do , there is no reference to any country selected.

The Idea that i had was to traverse the HTML page, and enter all countries in an array and then make a recursive function to call a page with specific country but for that we need something different in URL for each country in recursive function right?

Please help

+1  A: 

Your url is messed up, so I can't see the exact page you are talking about, however what is most likely happening is that when you change the country the website is making a POST request to the same page with a variable like country (although it may be something else) with the value of the country name/country id that you selected. If you View Source on the page you will be able to see the input field's name that is being passed on. Once you do that, while making your cURL request you can set the cuRL option of CURLOPT_POSTFIELDS, which reads like so:

The full data to post in a HTTP "POST" operation. To post a file, prepend a filename with @ and use the full path. This can either be passed as a urlencoded string like 'para1=val1&para2=val2&...' or as an array with the field name as key and field data as value.

So, keeping that in mind you would do something like this:

$ch = curl_init('http://tools.cisco.com/WWChannels/LO...BasicSearch.do');
$ch = curl_setopt($ch, CURLOPT_POSTFIELDS, array('country' => 'India'));
$ch = curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$contents = curl_exec($ch);
curl_close($ch);

As I said, though, the country => India part of it is an educated guess as to what the field might be passing. You have to inspect the HTML to find out for yourself.

Paolo Bergantino
+1  A: 

For automation/scraping, I would recommend that you use a virtual browser, such as SimpleBrowser. It's part of SimpleTest, but you can use it on its own.

troelskn
A: 

thanks for help

يوتيوب