views:

312

answers:

5

Hi,

In an application harvesting (a thousand) RSS feeds, I want to dynamically schedule the feed downloaders based on the following criteria:

  • The speed at which content are generated - source which produce content at a higher rate need to be visited more often. After downloading a feed, its content is analyzed and based on the current rate of publication the next run time is determined for the feed. Note that this is dynamically changing. Sometimes a feed is very active and sometimes it is slow.
  • Every RSS feed should be visited at least once an hour.

While the second one is done by most schedulers but the first one is more problematic. What JAVA-based open source scheduler would you recommend for the task? (and why ;) )

Thanks! Boaz

+9  A: 

Quartz is the open-source scheduler I've heard about most. It has good Spring support, integrates with JMX, and will work in either Java SE or EE. I don't think there's anything particularly dynamic about it's scheduling mechanisms internally, but you can schedule and unschedule jobs at runtime, so that should let you do what you want.

Kaleb Brasee
+2  A: 

Here is a list. I agree with Kaleb, Quartz is popular and used by many.

Romain Hippeau
+2  A: 

Actually, based on your specifications, I'm not sure that you need some external piece of software. Any scheduler will do, even the simple java.util.Timer will do, since all you need is that task is repeated at specified intervals, and exact scheduling is not that important to you (and if it is, you can always fork a new thread from timer task instead of doing the fetch from it). I don't see a need for Quartz or anything more advanced, since you don't want to schedule checks every second Friday at 4 PM, all you need is simple scheduling "once every hour" and similar.

As for rating the feeds, I'm pessimistic that you'll find the scheduler that supports it out-of-the-box; you're the one that probably has to program that part. But it shouldn't be too hard.

  1. By fetching every feed, you can easily calculate how often it's updated; say, take into account average time between updates, or shortest time between updates.

  2. Devise a formula which gives every feed a score between 0 and 1 based on that data, where 1 means "every hour" and 0 means, say, every minute (since you should have a minimum time between checks).

  3. Use score to schedule tasks.

  4. PROFIT! :)

Domchi
A: 

Informa is good Third party library source whcich can be used as a RSS schedular where Poller Interface can be used to poll particular site and observer Interface can be used to observer particular item. I hav used this in my project and its performance in terms of processing time and accuracy is really good. You can get this library from Sourceforge.net

Shashank T
A: 

Quartz is the best. You could look in to cron4j, jcrontab, jobscheduler, taskforrest,Quartz.NET as an alternative.

solidstone