My team has a situation where an SNMP SET will fail once every two weeks or so. Since this set happens automatically, we don't necessarily notice it immediately when it fails, and this can result in an inconsistent configuration and associated wailing and gnashing of teeth. The plan is to fix this by having our software automatically retry the SET when it fails.
The problem is, we aren't sure why the failure is happening. My (extremely limited) knowledge of SNMP isn't particularly helpful in diagnosing this problem, so I thought I'd ask StackOverflow for some advice. We think that every so often a spike in network traffic will cause the SET to fail. Since SNMP uses UDP for communication, I would think it would be relatively easy for a command to be drowned out if traffic was high for a short period of time. However, I have no idea how common this is. We have a small network with a single cisco router and there are less than a dozen SNMP controlled devices on that network. In addition to the SNMP traffic, there are some status web pages being loaded from the various devices. In case it makes a difference, I believe we are using the AdventNet SNMP API version 4.0.4 for Java.
Does it sound reasonable that there will be some SET commands dropped occasionally, or should we be looking for other causes?