views:

802

answers:

5

in web applications, how do i handle processing of large amounts of data that usually cause timeouts? eg. processing a db to output a large report? i usually use PHP/Zend Framework/MySQL.

+2  A: 

Asynchronous mechanisms of various sorts. For example, queue up the job in a job table and have an automated task in the background poll the table on a fixed schedule for newly-added jobs, process them, and somehow notify the user it has been done.

In one recent app, a document users must generate is actually requested in one page. This simply translates into the fact that a row is inserted into a job table. An event runs periodically, which performs all the SQL necessary to "compose" the document's data in a second table, and a third step is a PHP page that shows pending/completed documents. In this case, there is no real turnaround requirement, so people aren't actively notified via email or any such mechanism, but it'd be easy to layer that over it all.

There's a million ways to skin this cat...

alphadogg
i know .NET has MSMQ/WCF for queueing but i dont think there is for PHP hmm? isit? "and somehow notify the user it has been done" any ideas how this part can be done? this job u are talking abt? how can i run it such that it isnt affected by timeout? not running in PHP?
iceangel89
Actually, I keep things simpler. A job request is a single INSERT into a job table in the db. MySQL has Events, which are like Jobs in SQL Server, and I schedule with those. Or, for old versions of MySQL, you'd use cron. One or more SQL statements (wrapped in an event, job or other mechanism) create relevant data in an output table. And, a second PHP page allows users to view progress of those jobs.
alphadogg
Oh, and "notification" can take many forms. Commonly, it's email, although I have jacked into a corporate IM once. Again, it's usually another scheduled task. For example, a cron job could run a PHP script that emails out documents that are marked as complete but not yet sent.
alphadogg
A: 

set_time_limit ( int $seconds ) would allow you to modify the time limit for execution and you can reset it back to normal when you are done executing the scripts which would take a longer time to execute.

Pradeep
Be careful. I use this idea too, but don't just jack it up high, because users will only wait so long before they get frustrated. The PHP default is already a lengthy (according to the typical business user) 30 secs. Beyond 2 minutes, and you have to ask yourself if making users wait that long is the right way to design for the issue...
alphadogg
hmm,but if this report needs to be done on demand, i will need to set that forever, which may not be a good thing isit? cos it may "hang up" the system for too long? i think?
iceangel89
Depends on the script. If the script requests processing a large amount of data on the DB into a small amount for display, then generally no. But, if you have a complex script, or a large amount of return data, then there are many potential issues.There are very few times, in a business-line app, where a user simply cannot do any other work at all until an intensive report is completed by a server...
alphadogg
+1  A: 

While the server side code is processing the data the client will need to poll the server periodically to keep their session alive. If you do not poll the server periodically eventually your HTTP session will expire and cause a timeout error.

Flynn81
oh is polling the server periodically prevent timeout? how do i do that?
iceangel89
+1  A: 

One thing you can do is flush the output buffer, then send some javascript to update a progress bar during your processing. Here's an example:

<html>
<body>
<div id="progressBar" style="width: 0px; height: 20px; background: #900; color: #fff;">Progress</div>
</body>
</html>
<?

while(@ob_end_flush());

for ($i = 0; $i < 10; $i++) {

    echo "<script type=\"text/javascript\">var pb = document.getElementById('progressBar'); pb.style.width = '" . ($i * 20) . "px';</script>";
    sleep(1);
}

?>
Matt Bridges
This works well if, generally, a) the DB is responsive, and b) if the overall job doesn't take more than 1-3 minutes. (That's my own rule-of-thumb based on interaction with various users.) If the data is excessive, the script is complex and/or the overall response time exceeds my personal rule, I divide the interface into a start-job/get-result pair of pages.
alphadogg
+1  A: 

I'd typically execute the task asynchronously - I think you can use something like this in PHP:

exec ("/usr/local/bin/php build_report.php")
Mr. Matt
hmm i just saw the php exec docs. not really clear still. will this avoid the PHP timeout? and to be async, i must redirect the user to some file isit? or it will hang there till it completes? like exec(...); header("Location: ..."); isit? what if it hangs as in never completes, if the timeout does not apply?
iceangel89