Hi,
I'm building a system for monitoring several large web sites (resources), using distributed web services controlled by a central controller.
I'm coming to a specific part of the design - the actual reporting of resources that are thought to have fallen over.
My problem is that there is always the chance that the actual monitor it's self is at fault, or has lost it's network connection to a resource, and the resource is actually fine. I don't want to report issues if they are not really there.
My plan at the moment is to have the monitor to request that all other monitors check the resource if it encounters a problem, and then make a decision as to whether the resource has really fallen over based on collective results.
I'm sure there's someone out there with more experience of this type of programming than my self.
Is there a common solution to this type of problem? Is my solution a decent way of looking at this?