We are working on an ongoing project that relies heavily on 3rd party web services. By heavily, I mean that our web app can't function without this connectivity.
Once we got to production we started seeing piles of intermittent web exceptions coming through our error logging routines. After many, many lost hours trying to figure out what was causing the problems, it turned out to be an issue that wasn't at all programming related: DNS and network issues at the 3rd party. Turns out a bunch of packets were being lost, so SOAP headers would get mangled, or timeouts would occur.
This was the first our 3rd party had heard of this (probably BS), as they were surprised that we didn't implement retry logic against their web services. So, I guess we helped them find a bunch of network issues. Lucky us.
Is this pretty much a standard that we should have implemented retry logic? The reason I ask is that we have to go back and do a bunch of recoding to make up for the 3rd party's network issue, so we have to sort out who foots the bill. Out of interest, another 3rd party that we work with in the same manner has had almost no issues of this sort.
Thanks!