views:

139

answers:

1

We have an architecture with a couple hundred of servers with about 200 processes (all developed in-house) spread over them, some controlled by crontab and some that run as daemons. Some servers are in 'groups' where all servers are configured identically, and other servers have custom configurations. I've been tasked with centralizing the control of these tasks.

One use case is that a user ssh'ed into any box must be able to control any process on the network in something close to real time, by starting and stopping the relevant daemon or rebuilding and reinstalling the crontab. There are already various bits and pieces created to drive all of this from a database, but the overall architecture hasn't been thought through.

I'm expecting I'll write a daemon that will run on each server and mediate between peer-to-peer networking, the database and the daemons and crontabs.

I'm surveying what technologies might aid and abet this project.

I think I'm looking for peer-to-peer reliable communication technologies. Though I'm not 100 percent sure that's what I need. Things on my radar include Spread, JXTA, Zookeeper and JMS.

What are people's experiences with these technologies, and what other technologies should I check out? As I see it, I'm going to have very low data rates (a few thousand bytes per hour at most). But reliability and a mature API are important.

A: 

Sorry, this is not a really direct answer to any of your questions but stuff you describe sounds alot like you will end up doing alot of duplicate work - specially if just thinking about the "user ssh's into one box" part. (pun unintentional but now that i noticed it, yey!=)

Have you taken a look in projects like http://www.cfengine.org or http://www.theether.org/pssh/ ?

rasjani
We do already use cfengine for provisioning our servers. This effort is at a slightly higher level: Manipulating processes while the servers stay up. pssh is interesting, so thanks for the pointer, but is more a user-level tool. I need an API, plus built-in reliability.
Leonard
You can use cfengine for ongoing configuration updates, without requiring reboots. Although it's a lot of work. I thought puppet looked good. It's like cfengine, but let's you do some more stuff. IIRC, it can look at the process table to check if something is running or not.
Peter Cordes