views:

133

answers:

1

I'm trying to find the best solution for periodic task running in parallel. Requirements:

  1. Java (Spring w/o Hibernate).
  2. Tasks are being managed by front-end application and stored in MySQL DB (fields: id, frequency (in seconds), <other attributes/settings about task scenario>). -- Something like crontab, only with frequency (seconds) field, instead of minutes/hours/days/months/days of weeks.

I'm thinking about:

  1. TaskImporter thread polling Tasks from DB (via TasksDAO.findToProcess()) and submitting them to queue.
  2. java.util.concurrent.ThreadPoolExecutor running tasks (from queue) in parallel.

The most tricky part of this architecture is TasksDAO.findToProcess():

  1. How do I know which tasks is time to run right now?
    • I'm thinking about next_run Task field, which would be populated (UPDATE tasks SET next_run = TIMESTAMPADD(SECOND, NOW(), frequency) WHERE id = ? straight after selection (SELECT * FROM tasks WHERE next_run IS NULL OR next_run <= NOW() FOR UPDATE). The problem: Have to run lots of UPDATES for lots of SELECT'ed tasks (UPDATE for each Task or bulk UPDATE) + concurrency problems (see below).
  2. Ability to run several concurrent processing applications (cloud), using/polling same DB.
    • All of the concurring processing applications must run concrete task only once. Must lock all SELECT's from all other apps, until app A finishes updating (next_run) of all selected tasks. The problem: locking production table (front-end app) would slow things down. Table mirror?

I love simple and clean solutions and believe there's a better way to implement this processing application. Do you see any? :)

Thanks in advance.


EDIT: Using Quartz as a scheduler/executor is not an option because of syncing latency. Front-end app is not in Java and so is not able to interact with Quartz, except Webservice-oriented solution, which is not an option too, because front-end app has more data associated with previously mentioned Tasks and needs direct access to all data in DB (read+write).

+1  A: 

I would suggest using Scheduling API like Quartz rather than using Home grown implementation. It provides lot of API for implementation of logic and convenience. You will also have better control over jobs. http://www.quartz-scheduler.org/ http://www.quartz-scheduler.org/docs/tutorial/index.html

YoK
The only risk I see here is DB and Quartz scheduler syncing latency, which may cause some jobs to run after they were canceled or running frequency has been changed. What may be done to avoid these scenarios?
ljank
When using Quartz you can add/schedule Tasks using its API and not Database you currently have. Quartz provide ways save these tasks scheduling to DB. So your frontend will change a bit and actually use Quartz API completely to ADD/Schedule Tasks without using your current DB.
YoK
Great, I'm going to use JDBCJobStore to store my jobs in DB (they have to be visible in non-Java front-end and have to be associated with other data). Thanks!
ljank
It's strongly advised not to modify any data stored in Quartz DB tables, so this method will not work as expected. Unaccepted answer for more options.
ljank
Ya its true data should not be modified in Quartz DB. But what data related to Quartz you think to update ? Is it required ?Like when using Quartz next run time can be fetched from Quartz API. You will not need to store it separately in DB or modify same as it is managed by Quartz.
YoK
What I need is ability to read+write data (lots of related data, i.e. `Task`, `TaskResult`, `TaskFailRecipient` and other related data models) from front-end app (w/ full ORM) and "process" it in the back-end app (Java). "Processing" means reading list of periodic tasks to run (`Task`) and save task results (`TaskResult`) to the same DB (+ send reports about failed Tasks (`TaskFailRecipient`)). That's why I thought about Quartz and local DB data syncing.
ljank