views:

606

answers:

13

I'm working on an online PHP application that has a need for delayed PHP event. Basically I need to be able to execute arbitrary PHP code x many seconds (but it could be days) after the initial hit to a URL. I need fairly precise execution of these PHP event, also I want it to be fairly scalable. I'm trying to avoid the need to schedule a cron job to run every second. I was looking into Gearman, but it doesn't seem to provide any ability to schedule events and as I understand, PHP isn't really meant to run as a daemon.

It would be ideal if I could tell some external process to poll a "event checker" url on PHP server at the exact time that the next event should be run. This poll time will need to be able to decreased or increased at will since event can be removed and added to the queue and. Any ideas on an elegant way to accomplish this? There is simply to much overhead in calling PHP externally (having to parse HTTP request or calling via CLI) to make this idea feasible for my needs.

My current plan is write a PHP daemon that will run the event and interface with it from the PHP server with gearman. The PHP daemon would be build around SplMinHeap so hopefully the performance wouldn't be to bad. This idea leaves a bad taste in my mouth and I was wondering if anyone had a better idea? Ideas changed slightly. Read Edit 2.

EDIT:

I'm creating an online game that evolves players taking turns with variable time limit. I'm using XMPP and BOSH to allow me to push messages to and from my clients, but I've got that part all done and working. Now I'm trying to add an arbitrary event that triggers after play from the client to let the client (and other ppl in the game) that he took to long. I can't use timed trigger on the client side because that would be exploitable (since the client can play by themselves). Hope that helps.

EDIT 2:

Thank you all for your feedback. While I think most of your ideas would work well on small scale, I have a feeling they wouldn't scale very well (external event manager) or lack the exactness this project requires (CRON). Also, in both of those cases they are external pieces which could fail and add complexity to an already complex system.

I personally feel that the only clean solution that meets the requirements for this project is to write a PHP daemon that handles the delayed events. I've begun writing what I think is the first PHP runloop. It handles watching the sockets and executing delayed PHP events. Hopefully when I'm closer to being done with this project I can post up the source, if any of you are interested in it. So far in testing it has shown to be promising solution (no problems with memory leaking or instability).

EDIT 3: Here is a link to the PHP event loop library called LooPHP for those who are interested.

TL;DR Requirements

  • Call (preferably natively) PHP at a delayed time (ranging from seconds to days)
  • Handle creation/updating/deletion of events arbitrarily (I'm expecting a high amount of canceled call).
  • Handle high load of events scheduled (100-1000 a second per server)
  • Calls should be within one second of it's scheduled time
  • At this point i'm not open to rewriting the code base into another language (maybe some day I will)
A: 

What about this:

http://www.phpjobscheduler.co.uk/

Matthew
I will need to be able to handle a very high load in realtime. This appears to be cron manager and an ugly one at that.
Kendall Hopkins
...and a bad one since it only checks for events on PHP requests, it's probably not precise enough.
Alix Axel
A: 

Use the sleep function: http://php.net/sleep

andufo
Sleeping your script for several days is probably not a good idea. :)
deceze
@deceze I believe that PHP had a lot of problems running from cli. It leaked memory big time. But I believe they have fixed all these problems in PHP5 and that you can sleep your scripts for days without any problem.
Alfred
@Alfred Maybe so, but you still shouldn't. Not least because all your "queued" events will silently drop if the threads terminate for whatever reason.
deceze
LOL, come on enough down-votes!!!!!!!!!!!@andufo maybe you should delete post to not get more downvotes.
Alfred
A: 

I am not sure why you are trying to avoid cron. You could create a queue of requests in a table, and have cron fire up a process to check for current jobs.

There are a few issues, depending on your exact requirements. For instance:

  • How precise must the call be?
  • How long would each call take?
  • What is the normal and peak load in any given period?

So if you want precise execution, or execution takes longer than a second, or there is the potential for a heavy load, then the cron approach can run into problems.

I have many daeons that run PHP (using daemontools). With this approach you could hold the requests in core and perform whatever timing you wanted internally.

However if exact and relaible timing is what you want you should probably move away from PHP altogether.

Phil Wallach
Calls need to be within the second. Calls are likely to be small but under highload they could take longer than 1 second, and cause cron to backup and bottleneck the database. I'm trying to build the system be able to handle *hundreds* of requests a second per server (which it currently can). I regret building the server in PHP, and probably should have chosen node.js or java, but at this point it's unlikely that a rewrite is an option.
Kendall Hopkins
I think you need to bite the bullet and move away from PHP. I have been caught out like this before. If you have precise tinming requirements and a heavy load then you simply cannot trust that to a language as opaque as PHP. I would recommend C or C++ (not Java due to JVM and garbage collection issues) and write a deamon that handles this correctly. That could then drive something like Gearman, if distribued would help with scaling.
Phil Wallach
A: 

A can't think of anything that does everything you asked for:

  • has to be very precise
  • delay for long periods of time
  • ability to remove/change the time of the event

The trivial way would be to use a combination of the following functions:

set_time_limit(0);
ignore_user_abort(true);
time_sleep_until(strtotime('next Friday'));
// execute code

However, like @deceze said it's probably not a very good idea since if you set up a high delay Apache could eventually kill the child process (unless you're using PHP CLI, that would make it easier). It also doesn't allow you to change / delete the event unless you set up a more complex logic and a database to hold the events. Also, register_shutdown_function() might be useful if you want to go this road.

A better approach would be to set up a CRON job in my opinion.

Alix Axel
Your trivial solution is not possible since it would eat up to much RAM in php proc's. CRON job's firing every second gives me the chills since it could backup on high load.
Kendall Hopkins
@Kendall: Yeah, it ain't perfect in many ways. Why do you need to fire a CRON job every second? Just have the code create a new CRON job for the next event and when that's done (or when any event is changed / deleted) add the next event in queue.
Alix Axel
+3  A: 

This seems like the perfect place for an event Queue in a database.

Have your user-created events (triggered by visiting the web page) create an entry into the DB that includes the instructions for the action to take place, and the timestamp for when it should happen. You Daemon (either a persistant application or triggered by CRON) checks the DB for events that should have happened ( $TriggerTime <= time()) and that have not been flagged as "processed" yet. If you find one or more of these events, execute the instruction, and finally mark the event as "processed" in the DB or simply delete the entry.

The bonus of using the DB to store the events (and not something that is resident in the RAM of an application) is that you can recover from a crash without data loss, you can have more than one worker reading in a single event at a time, and you can modify the event's simply.

Also, there are lots of folks who use PHP as a general daemon scripting language on servers, etc. Cron can execute a PHP script (and confirm that an instance of that "app" is already running) that checks the Event Queue every-so-often. You can have a little app that dies after a minute of inactivity, and then gets restarted by CRON. The app can check the DB for entries at a fast frequency of your choosing (like 1s). Normally Cron cannot do a timing event faster than once per minute.

Evan
I'm pretty sure this would put to high of a load on the database as it would require all events being "processed" to be locked during this action, and if every page load is trying to get the lock it could block very easily.
Kendall Hopkins
I also think this could put a high load on your server, but I would first benchmark solution before doing to much premature optimizations. Some of these solution could scale for you(try out the most simple solution).
Alfred
A: 

I would just use cron to run a PHP file every so often (i.e. 5 minutes). The PHP file would check if there are any events that need to be fired within the next interval, grab the list of interval events, and sleep until the next event. Wake up, fire next event(s) in the list, sleep until the next one, repeat until done.

You could even scale it by forking or launching another php file to actually fire the event. Then you could fire more than one event at the same time.

Brent Baisley
no idea why this was downvoted. The only drawback is there is a 5 minute minimum event trigger time. Upvoting for totally reasonable solution.
Zak
thank you, I couldn't figure out the down vote either. Although not optimal, extremely simple solution using simple tools.
Brent Baisley
+3  A: 

I think a PHP only solution will be hard(almost impossible) to implement. I came up with two solutions to your problem.

PHP/Redis solution

Question asked by Kendall:

  • How stable is redis:

Redis is very stable. The developer really rights some clean C code. You should check it out on github ;). Also a lot of big sites are using redis. For example github.They had a really interesting blog post how they made github fast :). Also superfeedr uses redis. There are a lot more big companies which are using redis ;). I would advise you to google for it ;).

  • How PHP-friendly is redis:

PHP is very PHP friendly. A lot of users are righting PHP libraries for redis. The protocol is really simple. You can debug it with telnet ;). Looking quickly predis for example has the blocking pop implemented.

  • how would i remove events:

I think you should use something like ZRemCommand.

Redis is an advanced key-value store. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference between sets, and so forth. Redis supports different kind of sorting abilities.

What I came up with(Pseudo-code....):

processor.php:

<?php
######----processer.php
######You should do something like nohup php processor.php enough times for processors to run event. 
#$key: should be unique, but should also be used by wakeup.php
while(true) {
    $event = blpop($key); #One of the available blocking threads will wakeup and process event
    process($event); #You should write process. This could take some time so this process could not be available
    zrem($key1, $event); #Remove event after processing it. Added this later!!!!!!
}

client.php:

######----client.php
######The user/browser I guess should generate these events.
#$key1: should be unique.
#$millis: when event should run
#$event: just the event to work on.

if ("add event") {
  zadd($key1, $millis, $event);
} else if ("delete event") {
  zremove($key1, $event)
}

#Get event which has to be scheduled first
$first = zrange($key1, 0, 0);

if ($oldfirst <> $first) { #got different first event => notify wakeup.php.
    lpush($key2, $first);
}

$oldfirst = $first;

wakeup.php:

####wakeup.php
#### 1 time do something like nohup php wakeup.php
#http://code.google.com/p/redis/wiki/IntroductionToRedisDataTypes => read sorted set part.
while(true) {
    $first = zrange($key1, 0, 0);
    $event = blpop($key2, $timeoutTillFirstEvent);

    if ($event == nill) {
        #Blockingqueue has timedout which means event should be run by 1 of blocking threads.
        blpop($key2, $first);
    }    
}

Something along the lines of this you could also write a pretty efficient scheduler using PHP(Okay redis is C so kickass fast :)) only and it would be pretty efficient as well :). I would also like to code this solution so stayed tuned ;). I think I could write a usable prototype in a day....

My java solution

This morning I think I created a java program which you can use for your problem.

  1. download:

    Visit github's download page to download the jar file(with all dependencies included).

  2. install:

    java -jar schedule-broadcaster-1.0-SNAPSHOT-jar-with-dependencies-1277709762.jar

  3. Run simple PHP snippets

    1. First php -f scheduler.php
    2. Next php -f receiver.php
  4. Questions

    I created these little snippets so that hopefully you will understand how to use my program. There is also a little bit documentation in the WIKI.

App Engine's TaskQueue

A quick solution would be to Use Google's app engine task queue which has a reasonable free quota. After that you have to pay for what you use.

Using this model, App Engine's Task Queue API allows you to specify tasks as HTTP Requests (both the contents of the request as its data, and the target URL of the request as its code reference). Programmatically referring to a bundled HTTP request in this fashion is sometimes called a "web hook."

Importantly, the offline nature of the Task Queue API allows you to specify web hooks ahead of time, without waiting for their actual execution. Thus, an application might create many web hooks at once and then hand them off to App Engine; the system will then process them asynchronously in the background (by 'invoking' the HTTP request). This web hook model enables efficient parallel processing - App Engine may invoke multiple tasks, or web hooks, simultaneously.

To summarize, the Task Queue API allows a developer to execute work in the background, asynchronously, by chunking that work into offline web hooks. The system will invoke those web hooks on the application's behalf, scheduling for optimal performance by possibly executing multiple webhooks in parallel. This model of granular units of work, based on the HTTP standard, allows App Engine to efficiently perform background processing in a way that works with any programming language or web application framework.

Alfred
Is it possible to have this running on the same server that my current PHP server is on? If not, this won't work.
Kendall Hopkins
Hi Kendall the task queue is running in the cloud, but it uses webhooks so if the urls are accessible from the web(no firewall) in the middle than you can let google call the urls at the specified time. Else I guess you can't use this solution. Hope this explained your question?
Alfred
@Alfred I don't really want to depend on an external server to trigger my internal services, seems even more backwards than using CRON.
Kendall Hopkins
+1, thi is similar to http://www.webbasedcron.com/.
Alix Axel
@Alix it is something like that, but google's task queue is better qua performance, pricing.
Alfred
@Kendall You are somewhat right about the external service triggering your internal service. But the taskqueue is pretty fast/stable/cheap/easy to implement. There are better solutions but they take more time.
Alfred
One very important feature I need to have in order to make it practical is the ability to remove tasks and/or change the times on them. The current project doesn't seem to have that feature.
Kendall Hopkins
Just wanted to ask what you think of application :)? Have you used it already? Right now you are right that it can remove tasks yet. It was a prototype to show you. I guess I could add that feature later. But I think you could also do these checks from your side(PHP). You just check if task is still valid from PHP or add another task if necessary. But that also were features that I wanted to add later :). I also have created a version which will call specified url when event happend.
Alfred
@Alfred I really like the redis solution. It's very simple and clean, and sounds fast and scalable also. I've never used (or heard) of it before. How stable/usable/PHP-friendly is it? I'm planning on accepting (and implementing) this answer unless something else comes up before the bounty ends. One concern I had was how would i remove events if they are in the daemon "cloud". Would have to broadcast the event cancelations to each daemon?
Kendall Hopkins
@Kendall Redis is very fast stable C code. You should have a look at the C code he produces on github ;). If you go to the site you see they have a lot of php-friendly libraries. Also the redis protocol is really really simple. You could debug using telnet client ;). P.S: Events shouldn't be on the deamon processes but fetched from redis-server.
Alfred
@Kendall P.S: I think I made a little thinking error but you could implement it really cleanly with Redis. I would try to update my post as soon as possible :)
Alfred
+2  A: 

I recommend also the queue strategy, but you seem to dislike using the database as queue. You've got a XMPP infrastructure, so leverage it: use a pubsub Node and post your events to this node. Pubsub can optionally be configured to store unfetched items in a persistent way.

Your daemon process (no matter what language) can fetch all stored items at startup time and subscribe to changes to get notified about incoming actions. This way you can solve your problem in an elegant, asynchronous way.

tweber
I only use XMPP as a transport for data between the server and clients (where the server is just a "special" client). Having it store data would probably lead to worse performance than just keeping it in the database and polling it with CRON. I did find <http://xmpp.org/extensions/xep-0203.html> which allows for delayed events, but doesn't allow you to cancel event once they have been "queued", so that means every event "timeout" in my case would have to be sent out to all clients and then ignored (wasting loads of bandwidth).
Kendall Hopkins
I'm not giving up yet ;-)Whats your XMPP Server? If it's extensible, you could consider writing a plugin for it and implement your scheduler/"watchdog" as XMPP component. It knows through the presence services which clients are logged in and you could also interface with your php business logic via bosh or a REST api. How about that?
tweber
@tweber I think an XMPP extension would be optimal but I don't know erlang (ejabberd). I would also like to expand my user base into gtalk and facebook users, whose XMPP servers I couldn't touch. For a while I was considering rewriting the whole app as a extension onto a XMPP server, but decided against it because a large part of server code already existed in PHP. I like you're thinking, it's just not going to help me this late in the game. :\
Kendall Hopkins
We had a similiar problem. Large PHP code base and synchronisation of external systems via xmpp. I solved this with a xmpp plugin (openfire server) and a REST interface against the PHP business logic (so you can benefit from a http cache for performance reasons). If you don't know erlang: maybe you dont need to... it could be enough to write a external xmpp component (think "bot on steroids") instead of a plugin.I agree that this is a fundamental architecturial decision and you can hardly implement this if it doesnt fit into exitsting code. Maybe you can use this in another project ;-)
tweber
There is an XMPP extension protocol that lets you add on to the protocol without modifying any server daemons. It's called the "Jabber Component Protocol." Configure your server to allow your component, and then implement your component in whatever language you like. GTalk and other servers will be able to use it via server-to-server communication to your server. http://xmpp.org/extensions/xep-0114.html
Sean Edwards
+6  A: 

Have your php script make an exec call to schedule your PHP script to run at the time you need using the command "at"

exec("at 22:56 /usr/bin/php myscript.php");

at executes commands at a specified time.

from the man page:

At allows fairly complex time specifications, extending the POSIX.2 standard. It accepts times of the form HH:MM to run a job at a spe- cific time of day. (If that time is already past, the next day is assumed.) You may also specify midnight, noon, or teatime (4pm) and you can have a time-of-day suffixed with AM or PM for running in the morning or the evening. You can also say what day the job will be run, by giving a date in the form month-name day with an optional year, or giving a date of the form MMDDYY or MM/DD/YY or DD.MM.YY. The specifi- cation of a date must follow the specification of the time of day. You can also give times like now + count time-units, where the time-units can be minutes, hours, days, or weeks and you can tell at to run the job today by suffixing the time with today and to run the job tomorrow by suffixing the time with tomorrow.

Further, if you need one second time resolution, have your script run at the start of the minute, then just sleep n seconds until it is time to execute.

Zak
@Zak Is there a better way to interface with this event queue? System calls to PHP don't use the opt-code cache and would be very slow. If queued up curl hits to PHP url's it still would be a bit hacky because it depends on system calls. Also, can this handle thousands of request a minute?
Kendall Hopkins
Seems like you need to break this into two scenarios. Scenario 1: few requests to start, need to execute delayed events hours or days away. Use at to schedule php to run your events. Don't worry about opcode optimization.Scenario 2: you are growing and are now getting 100s - 1000s of hits per min. By hits we mean requests to add future "event triggers". In this case, I would just enter the jobs in the database, and have cron run (maybe run a curl request) every minute to process the jobs for that minute.Clearly identify if you need sparse scheduling or continuous processing of events.
Zak
I need continuous processing of events, regardless. Or at least I need them to run with in ~1 second of when they are scheduled.
Kendall Hopkins
I've done some scalability testing with at and it doesn't seem to work at the level I want. The overhead of having to non-natively call PHP is hurting the performance to much. I know it's what the question was initially asking for, but I didn't foresee the overhead that this sort of solution has.
Kendall Hopkins
Not to criticize, but lots of developers say you don't have a performance overhead problem until you measure your performance. Do you have the metrics for how long it takes to execute your script via the at command?
Zak
@Zak I've profiled the requests before and after daemonizing the workers. 1/2-2/3 of the time was spent waiting on services (XMPP) to connect alone (not using encrption). I was able to go from 25req/sec to over 100req/sec. This overhead isn't really PHP's overhead, but how non-persitant connections handle the XMPP connections and the cost of having to internal authenticate with the server each PHP request. I know you provided the perfect answer to the question I asked, but it wasn't the best solution for my larger issue. None the less, thanks for telling me about `at` :)
Kendall Hopkins
A: 

What about using either cron to run a checker, that can the execute stuff from the DB for example.

Or using the "at" linux command to schedule execution of some command?

Killer_X
A: 

Here's the correct answer, but you may not like it.

PHP is designed entirely around being used as a request-response (http) language, and thus doesn't support what you are looking for - it's great to hack and find ways around, but it will be just that, a hack, whatever 'solution' you end up getting.

What you really need is an event driven language that supports xmpp, and for that you need look no further than node.js /v8 and the supporting XMPP libraries - this natively supports and is designed for just what you need. you could also go down the Java route, but if you want to port quickly and get a whole host of new features and support for what you are doing, node is the one.

If you insist on going with PHP (as I have many times over many years) the 'lightest' and most effective way to do this is a persistent PHP deamon with an event Queue in a database - sadly!

nathan
Sad this answer is getting mod'd down so much. There are ways of accomplishing this task in PHP yes, but it's just not that good at it. If you want to scale (without massive amounts infrastructure like the Facebook's of the world that are scaling with PHP) you just don't use PHP. Despite what the marketing literature says, PHP is not designed to scale in any way. Want to scale like Facebook, etc? Pass off your messages into a queue of some sort (ActiveMQ with its STOMP support is a good choice) or to a scala or erlang backend. Now back to my 4th of July drinking and BBQing...
Trey
cheers Trey, and well noted about erlang and scala, many other languages will handle (and are designed to handle) this problem much better - php simply doesn't do it, without hacking and relying on 'some other process'
nathan
did you get it sorted?
nathan
I didn't vote you down. But I disagree with the notion that PHP cannot scale. It might scale a little worse than some other languages but it is good enough to be used at some of the largest websites online (Facebook, Wikipedia, Flickr, Yahoo Answers, Yahoo Bookmarks, Delicious, Digg, Friendster, SourceForge, Photobucket and others). Plus PHP's scaling deficit (which is not really big imho) is evened with MySQL which is very fast database and together with PHP they are scalable.
Richard Knop
Scalability is more about how you write your application than which technology you use, there are Java/ColdFusion/etc applications that need huge infrastructure - similar to Facebook - to scale properly (MySpace anyone? They had massive scalability problems in past if I remember correctly and it's not written in PHP) so it has really nothing to do with language.
Richard Knop
@Richard can't remember saying or implying PHP doesn't scale (and if I implied it then apologies!) I was saying that PHP isn't an event driven language, it's invoked on request to provide a response [end] that's it, typically designed for use on The Web when responding to HTTP requests. By nature of course, this makes it almost infinitely scalable because HTTP is a stateless universal protocol, and the web is built on the notion of universality. Since it doesn't have the notion of state it's more scalable than anything stateful. Again, sincerely hope I didn't imply PHP didn't scale well :)
nathan
@nathan Take a look at my github project (EDIT3). I actually completely agree with your answer. PHP is possibly the *worst* language to do event driven programming in. Since PHP 5.3 has fix a huge issue that PHP with leaking memory there is no reason you can't use PHP for a long lived daemon. There is little-to-no support for doing and event driven PHP program. Hopefully I can change that :)
Kendall Hopkins
A: 

Maybe your processing can be done in a lazy way. If your script does not produce any "output" like an email to the user, then you may be able to create a job queue in the database. The next time someone affected by these jobs loads a page, all these jobs are executed in order. Also (but not necessary), you could add a cron that does some of these jobs during non-peak hours.

Full Decent
It has to be precise to the second...
Alix Axel
+1  A: 

You could use Node.JS which is an event-driven, JavaScript-based web server. Run it on a secret, internal port with a script that receives notification from the PHP script and then schedules the action to be run xx seconds later. The action in Node.JS could be as simple as running a PHP script on the main web server.

Colin O'Dell