If at first you don’t succeed

Spread the love

Retries are a fact of life in testing. That file you are trying to read might be temporarily locked. The network connection could momentarily break. You may need to keep pinging a server for status until it returns the result you expect.

Easy, right? Just write a while loop! You might start out with some code like this:

private static async Task PingWithRetriesAsync()
{
    do
    {
        StatusCode code = await PingServerAsync();
        if (code == StatusCode.Alive)
        {
            break;
        }
    }
    while (true);
}

Of course, you don’t want to retry infinitely, nor do you want flood the system with too many requests. So let’s add a max timeout and a back off delay:

private static async Task PingWithRetriesAsync(TimeSpan timeout)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    do
    {
        StatusCode code = await PingServerAsync();
        if (code == StatusCode.Alive)
        {
            break;
        }

        await Task.Delay(TimeSpan.FromSeconds(5.0d));
    }
    while (stopwatch.Elapsed < timeout);
}

Better, but there are some subtle bugs. The code above will unconditionally back off even if the timeout has already expired, causing you to wait up to five seconds longer than intended. This code also doesn’t handle exceptions at all — maybe there are some which should be retried (e.g. TimeoutException). Finally, no error is thrown at the end when the timeout is reached. Even if you fix all that, PingWithRetriesAsync will only work for a single situation.

To solve this problem generically, I have created the RetrySample project. In it you will find utility classes like RetryLoop and RetryContext to help create simple and correct retry loops.

Let’s rewrite the above example using RetryLoop:

private static async Task PingWithRetriesAsync(TimeSpan timeout)
{
    RetryLoop loop = new RetryLoop(r => r.AddAsync("code", PingServerAsync()));
    loop.Succeeded = r => r.Get<StatusCode>("code") == StatusCode.Alive;
    loop.BeforeRetry = r => Task.Delay(TimeSpan.FromSeconds(5.0d));
    loop.ShouldRetry = r => r.ElapsedTime < timeout;
    RetryContext context = await loop.ExecuteAsync();
    if (!context.Succeeded)
    {
        throw new InvalidOperationException("Ping failed after timeout.");
    }
}

Now we have a simple, more declarative model to promote reuse of retry logic all throughout our program. We have also fixed the previously mentioned bugs — RetryLoop catches and handles all exceptions and only invokes the “before retry” logic when the next iteration is about to run.

For a more involved sample, look at the integration app. Here I have implemented a simple polling loop that attempts to open and read a file with a specific name (“test.txt”) and keeps trying until it successfully reads the ASCII string “TEST” or one minute elapses, whichever comes first. Running the app produces output like the following (in the middle of the run I created and updated the file to break out of the loop):

[000.011/T=1] ReadFileAsync error: FileNotFoundException: Could not find file '...\test.txt'.
[000.013/T=1] Backing off for 0.010 seconds.
[000.023/T=4] ReadFileAsync error: FileNotFoundException: Could not find file '...\test.txt'.
[000.023/T=4] Backing off for 0.020 seconds.
[ . . . ]
[016.396/T=4] ReadFileAsync error: FileNotFoundException: Could not find file '...\test.txt'.
[016.397/T=4] Backing off for 0.570 seconds.
[016.968/T=7] ReadFileAsync read only 0 bytes.
[016.969/T=7] Backing off for 0.580 seconds.
[ . . . ]
[022.609/T=8] Backing off for 0.670 seconds.
[023.294/T=7] ReadFileAsync read 'WHAT'
[023.294/T=7] Backing off for 0.680 seconds.
[ . . . ]
[026.821/T=8] ReadFileAsync read 'WHAT'
[026.821/T=8] Backing off for 0.730 seconds.
[027.564/T=9] ReadFileAsync read 'TEST'

Try it out! See also the commit history for RetrySample to watch how the design emerged through successive unit tests.

One thought on “If at first you don’t succeed

  1. Pingback: Useful proxies | WriteAsync .NET

Leave a Reply

Your email address will not be published. Required fields are marked *