views:

47

answers:

2

We are working on an ongoing project that relies heavily on 3rd party web services. By heavily, I mean that our web app can't function without this connectivity.

Once we got to production we started seeing piles of intermittent web exceptions coming through our error logging routines. After many, many lost hours trying to figure out what was causing the problems, it turned out to be an issue that wasn't at all programming related: DNS and network issues at the 3rd party. Turns out a bunch of packets were being lost, so SOAP headers would get mangled, or timeouts would occur.

This was the first our 3rd party had heard of this (probably BS), as they were surprised that we didn't implement retry logic against their web services. So, I guess we helped them find a bunch of network issues. Lucky us.

Is this pretty much a standard that we should have implemented retry logic? The reason I ask is that we have to go back and do a bunch of recoding to make up for the 3rd party's network issue, so we have to sort out who foots the bill. Out of interest, another 3rd party that we work with in the same manner has had almost no issues of this sort.

Thanks!

A: 

Of course.

Any service that communicates via a network needs to have the ability to retry. You can argue all day about where these retries should be implemented (network controller? interface controllers? let the user do it manually?), but someone has got to do it.

kubi
A: 

We use retry a lot with network activity or certain inter-process coordination things. Here's a utility class we use to make it even easier.

public static void Retry(int times, TimeSpan tryWait, string description, Action action)
{
    bool warned = false;

    for (int count = 0; count < times; count++)
    {
        try
        {
            action();

            if (warned)
            {
                LogSuccess(count, description);
            }
            return;
        }
        catch (Exception ex)
        {
            if (count == times - 1)
            {
                throw;
            }
            LogWarning(count, times, description, tryWait, ex);
            Thread.Sleep((int) (tryWait.TotalMilliseconds * (count + 1)));
            warned = true;
        }
    }
}

public static TResult Retry<TResult>(int times, TimeSpan tryWait, string description, Func<TResult> function)
{
    bool warned = false;

    for (int count = 0; count < times; count++)
    {
        try
        {
            TResult result = function();
            if (warned)
            {
                LogSuccess(count, description);
            }
            return result;
        }
        catch (Exception ex)
        {
            if (count == times - 1)
            {
                throw;
            }
            LogWarning(count, times, description, tryWait, ex);
            Thread.Sleep((int)(tryWait.TotalMilliseconds * (count + 1)));
            warned = true;
        }
    }

    throw new InvalidOperationException(
        String.Format(
            "An intenral error occurred retrying {0}.  We didn't throw an error on last run but didn't successfully run either.",
            description));
}


private static void LogWarning(int count, int times, string description, TimeSpan tryWait, Exception ex)
{
    Log.Warn(typeof (RetryUtil),
             String.Format("Executing {0} failed. Already tried {1} out of {2}. Retrying after {3} ms wait.",
                           description, 
                           count + 1, 
                           times, 
                           tryWait),
             ex);
}

private static void LogSuccess(int count, string description)
{
    Log.Debug(typeof(RetryUtil),
             "Successfully executed {0} after {1} attempts.",
             description,
             count);
}
Sam