views:

29

answers:

2

script

<?php
include('time.php'); //time script
echo "First 100 users of SO<br/>";
for($i=0; $i<100;$i++){
$contents=file_get_contents("http://stackoverflow.com/privileges/user/".$i);
preg_match('!<div class="summarycount al">(.+?)</div>!', $contents, $matches);
$rep = $matches[1];
echo "<br/>".$i.") ".$rep."<br/>";

include('timetaken.php'); //script which outputs time difference
}
?>

output

First 100 users of SO

0) 0
2.3584280014038
1) 14,436
4.469074010849
2) 875
10.651238918304
3) 2,431
12.991086959839
4) 8,611
15.451638936996
5) 14,988
17.535580873489
6) 0
19.686461925507
7) 0
21.796992063522
8) 218
23.931365013123
9) 2,569
26.419286966324
10) 101
28.540382862091
11) 232
30.755586862564
12) 0
32.960548877716
13) 33,898
35.163224935532
14) 0
37.280658006668
15) 6,388
39.425274848938
16) 143
41.541421890259
17) 14,366
43.655340909958
18) 0
45.771246910095
19) 99
47.882269859314
20) 4,204
49.993322849274
21) 0
52.108762979507
22) 1,517
54.221307039261
23) 411
56.345490932465
24) 103
58.892389059067
Fatal error: Maximum execution time of 60 seconds exceeded in C:\test.php on line 5

Problems with this script: 1. The page loads after 60 seconds when it timesout

I know I can add a

set_time_limit(500);

to the code and get the first 100 reputations, but that will result in a page load after say 120seconds.

How can I get the result in short bursts, as the data is gathered using PHP or using any other language (python, java) or anything else. Before someone says it, I have read http://stackoverflow.com/questions/2212635/best-way-to-manage-long-running-php-script, which may be a possible duplicate , but does not answer my question. My question is not completing the entire job, but displaying the results as it is being done.

(Please deal with the tags for me)

+1  A: 

If the sample is really what you're doing, how about harvesting data, and storing it locally? That'd speed things up a lot. You can periodically reharvest, but unless the data's really time sensitive, this probably isn't something you want to do on every page load.

Alternatively, I'd seriously consider moving some logic over to the client. Here's two approaches:

1) Have a php process which takes a get parameter for record to start with. Make ajax calls to that process with each one grabbing the next set of records, and adding the result to the dom. (JQuery's one good way to do this.) Depending on your needs, and what testing shows, you could have the callback for each get launch the next get request, or you could launch several gets at a time.

2) Skip php entirely, and do everything with javascript in the browser. After all, you're just loading and parsing html (although you might have to deal with some cross-domain issues).

Sid_M
The sample is an example, I was looking to display output as the script is processed, rather than at the end of it. What do you exactly mean my harvesting data. To query the entire SO userlist this way, I would have to set a really big time limit and then store it to a local db. Anyway to do this in the background on the server.
abel
A simple version of harvesting might work something like this. Store a variable for last harvested id. Have a script which looks up that variable, then starts importing records. For each imported record, extract and locally store (presumably in a db) the information you want. Before exiting (or upon each completion of a record), update the stored last harvested id variable. Set the script to only harvest so many records as it can in, say 5 minutes. Have cron regularly call the script until all records are harvested. Then occassionally get any new records created since the last harvest.
Sid_M
Just to be clear: On a linux box, you use cron to regularly run scheduled events. On IIS, you use the scheduler (I think that's what it's called). In either case, that's the standard way to run a regularly occurring background php process on your server.
Sid_M
Although I stand by my earlier suggestions, here's another way to go at it: http://stackoverflow.com/questions/3893724/php-output-values-to-screen-mid-loop
Sid_M
+1  A: 

When looking at something like this, I like to think of it this way: You're not retrieving a hundred users, you're retrieving a single arbitrary user a hundred times. I would look to break the retrieval of the data you're looking at (stepping away from the example in question) into a simple function, then call that function via Javascript (whether the JS is doing it itself, or if you have the JS call a PHP page which returns the results is up to you), and then update the results on the page as they come in.

In this way, you don't need to set a huge timeout, and you can update the page as the results propagate, instead of trying to do it all in one big chunk.

EricBoersma