views:

331

answers:

5

I want to write a long running process (linux daemon) that serves two purposes:

  • responds to REST web requests
  • executes jobs which can be scheduled

I originally had it working as a simple program that would run through runs and do the updates which I then cron’d, but now I have the added REST requirement, and would also like to change the frequency of some jobs, but not others (let’s say all jobs have different frequencies).

I have 0 experience writing long running processes, especially ones that do things on their own, rather than responding to requests.

My basic plan is to run the REST part in a separate thread/process, and figured I’d run the jobs part separately.

I’m wondering if there exists any patterns, specifically python, (I’ve looked and haven’t really found any examples of what I want to do) or if anyone has any suggestions on where to begin with transitioning my project to meet these new requirements. I’ve seen a few projects that touch on scheduling, but I’m really looking for real world user experience / suggestions here. What works / doesn’t work for you?

A: 

I usually use cron for scheduling. As for REST you can use one of the many, many web frameworks out there. But just running SimpleHTTPServer should be enough.

You can schedule the REST service startup with cron @reboot

@reboot (cd /path/to/my/app && nohup python myserver.py&)
lazy1
TBH the rest implementation is not something im too worried about, as i've said i think i've got the long running process responding to request things down, its the do X every Y hours/mins/seconds for 10000 jobs thats got me stumped.
lostincode
A: 

One option is to simply choose a lightweight WSGI server from this list:

and let it do the work of a long-running process that serves requests. (I would recommend Spawning.) Your code can concentrate on the REST API and handling requests through the well defined WSGI interface, and scheduling jobs.

There are at least a couple of scheduling libraries you could use, but I don't know much about them:

ars
scheduler-py looks great, going to dig into it's guts in the morning.
lostincode
A: 

The usual design pattern for a scheduler would be:

  • Maintain a list of scheduled jobs, sorted by next-run-time (as Date-Time value);
  • When woken up, compare the first job in the list with the current time. If it's due or overdue, remove it from the list and run it. Continue working your way through the list this way until the first job is not due yet, then go to sleep for (next_job_due_date - current_time);
  • When a job finishes running, re-schedule it if appropriate;
  • After adding a job to the schedule, wake up the scheduler process.

Tweak as appropriate for your situation (eg. sometimes you might want to re-schedule jobs to run again at the point that they start running rather than finish).

caf
this pretty much confirmed what i was thinking. I'd mark it correct but really im waiting for some to upvote some of these before picking a 'correct' answer. ars's suggestion of scheduler-py looks great, im going to give it a try tomorrow.
lostincode
+2  A: 
  • If the REST server and the scheduled jobs have nothing in common, do two separate implementations, the REST server and the jobs stuff, and run them as separate processes.

  • As mentioned previously, look into existing schedulers for the jobs stuff. I don't know if Twisted would be an alternative, but you might want to check this platform.

  • If, OTOH, the REST interface invokes the same functionality as the scheduled jobs do, you should try to look at them as two interfaces to the same functionality, e.g. like this:

    • Write the actual jobs as programs the REST server can fork and run.
    • Have a separate scheduler that handles the timing of the jobs.
    • If a job is due to run, let the scheduler issue a corresponding REST request to the local server. This way the scheduler only handles job descriptions, but has no own knowledge how they are implemented.
  • It's a common trait for long-running, high-availability processes to have an additional "supervisor" process that just checks the necessary demons are up and running, and restarts them as necessary.

ThomasH
torn as to what to mark correct here. some really good suggestions
lostincode
+1  A: 

Here's what we did.

  1. Wrote a simple, pure-wsgi web application to respond to REST requests.

    • Start jobs

    • Report status of jobs

  2. Extended the built-in wsgiref server to use the select module to check for incoming requests.

    • Activity on the socket is ordinary REST request, we let the wsgiref handle this. It will -- eventually -- call our WSGI applications to respond to status and submit requests.

    • Timeout means that we have to do two things:

      • Check all children that are running to see if they're done. Update their status, etc.

      • Check a crontab-like schedule to see if there's any scheduled work to do. This is a SQLite database that this server maintains.

S.Lott