ansaurus

Question

Process pages for a certain information using PHP

Answer 1

+2 A:

You'll need to login (through PHP) to see relevant information. This isn't very straightforward and will require some work.
You can use *shrugs* regex to parse data, or use an XML parser like PHP Simple HTML DOM Parser. With regex...:
```
preg_match('!<div class="summarycount al">(.+?)</div>!', $contents, $matches);
$rep = $matches[1];
```
If you are scraping SO, you can use the SO API instead.

Code:

$url = 'http://api.stackoverflow.com/1.0/users/3';

$tuCurl = curl_init(); 
curl_setopt($tuCurl, CURLOPT_URL, $url); 
curl_setopt($tuCurl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($tuCurl, CURLOPT_ENCODING, 'gzip'); 

$data = curl_exec($tuCurl); 
$parse = json_decode($data, true);
$rep = $parse['users'][0]['reputation'];

echo $rep;

Rogue 2010-10-07 16:32:49

thanks for the attempt. I am really bad at regex. I will go through it.The curent page does not need login so no worries. And this was a generic question with SO as an example. The code works! Thanks

abel 2010-10-07 16:36:01

Time taken 2.11 seconds. Getting 10000 users will take 5.6 hrs. Can I complete the entire thing in one script without timeouts?

abel 2010-10-07 16:42:32

@abel Yes, you can change the `max_execution_time` setting. I would strongly recommend using the SO API though, or downloading a [data-dump](http://blog.stackoverflow.com/2010/10/creative-commons-data-dump-oct-10/) and getting info from there.

Rogue 2010-10-07 16:46:05

@Rogue This isn't about SO per se, I have played with the execution time setting, can I get Burstable output more here http://stackoverflow.com/questions/3884008/burstable-output-to-long-running-scripts

abel 2010-10-08 08:59:52

ansaurus

tags:

views:

answers:

Process pages for a certain information using PHP

related questions