We have an architecture with a couple hundred of servers with about 200 processes (all developed in-house) spread over them, some controlled by crontab and some that run as daemons. Some servers are in 'groups' where all servers are configured identically, and other servers have custom configurations. I've been tasked with centralizing the control of these tasks.
One use case is that a user ssh'ed into any box must be able to control any process on the network in something close to real time, by starting and stopping the relevant daemon or rebuilding and reinstalling the crontab. There are already various bits and pieces created to drive all of this from a database, but the overall architecture hasn't been thought through.
I'm expecting I'll write a daemon that will run on each server and mediate between peer-to-peer networking, the database and the daemons and crontabs.
I'm surveying what technologies might aid and abet this project.
I think I'm looking for peer-to-peer reliable communication technologies. Though I'm not 100 percent sure that's what I need. Things on my radar include Spread, JXTA, Zookeeper and JMS.
What are people's experiences with these technologies, and what other technologies should I check out? As I see it, I'm going to have very low data rates (a few thousand bytes per hour at most). But reliability and a mature API are important.