views:

40

answers:

2

This is a semi-experimental thing for me.

I have a cluster of over 100 (variable) nodes, and I want to write a monitoring application that would poll all the web nodes every n (eg 1 or 2) seconds, and record their response times.

If the web node is already struggling, I may not want to bring it down by adding more requests. So it would be better if the poller has some intelligence to it.

What language would you choose for such a project? Any open source projects that already do this that I can poke around? Any technical challenges that you can think of?

I am starting to look at Hyperic HQ code, but man, that thing is huge.

+1  A: 

Have a look at OpenNMS, it's quite good at that sort of monitoring, and it's open-source, so you can have a poke around in its innards.

Hyperic HQ is also very good, but as you say, it's a monster.

skaffman
+1  A: 

Any open source projects that already do this that I can poke around?

If I had to write something like this, I'd use RRDtool (implementations available for several languages).

But before to write anything, I'd check SmokePing or one of the numerous (more elaborated) monitoring solutions that can do applicative monitoring:

Pascal Thivent