A loop is a responsibility

Spread the love

A lot has been written about the single responsibility principle over the years. Rather than rehash what many others have said better, I want to give a common example of how a seemingly innocuous control structure can subtly but surely increase coupling and confusion.

Imagine a system that depends on a collection of “microservices.” These services all have a simple “ping” API that can be accessed over HTTP. The response code indicates whether the service is functioning properly (e.g. HTTP 200 means everything healthy, HTTP 503 means the service is degraded, and so on). Now suppose that you need to write a watchdog which can periodically ping each service and raise a warning or error based on the result.

Here are the supporting interfaces:

public interface IPingClient
{
    string Id { get; }
    Task<HttpStatusCode> PingAsync();
}

public interface IAlert
{
    Task RaiseWarningAsync(string id, int code);
    Task RaiseErrorAsync(string id, int code);
}

And now a naïve first cut at the health check function:

public static async Task CheckHealthAsync(IAlert alert, IPingClient[] clients)
{
    foreach (IPingClient client in clients)
    {
        HttpStatusCode status = await client.PingAsync();
        switch (status)
        {
            case HttpStatusCode.OK:
                // All is good; do nothing.
                break;
            case HttpStatusCode.ServiceUnavailable:
                // Degraded but still alive.
                await alert.RaiseWarningAsync(client.Id, (int)status);
                break;
            default:
                // Treat anything else as fatal.
                await alert.RaiseErrorAsync(client.Id, (int)status);
                break;
        }
    }
}

It looks simple enough. Let’s move on to discuss what the tests for this function would look like. There is the degenerate case of an empty client array. We don’t expect anything to happen (no pings or alerts) and luckily the code “just works” in that situation:

// Stub to ensure that an alert is never raised
public class FailOnAnyAlert : IAlert
{
    public Task RaiseWarningAsync(string id, int code)
    {
        throw new NotSupportedException("This should not be called.");
    }

    public Task RaiseErrorAsync(string id, int code)
    {
        throw new NotSupportedException("This should not be called.");
    }
}

[TestMethod]
public void CheckZeroItemsDoesNothing()
{
    Task task = Watchdog.CheckHealthAsync(new FailOnAnyAlert(), new IPingClient[0]);

    Assert.IsTrue(task.IsCompleted);
    Assert.IsNull(task.Exception);
}

Now we need to verify an array of size one, but there are three simple sub-cases: the status was OK (no alert), the status was ServiceUnavailable (warning), and the status was anything else (error):

// Stub to always return given ID/status code
public class PingClientStub : IPingClient
{
    public PingClientStub(string id, HttpStatusCode code)
    // ...
}

// Stub to keep track of warnings/errors raised
public class AlertStub : IAlert
{
    // ...
    public IList<Tuple<string, int>> Warnings { get; private set; }
    public IList<Tuple<string, int>> Errors { get; private set; }
    // ...
}

[TestMethod]
public void CheckOneItemOnOKDoesNothing()
{
    Task task = Watchdog.CheckHealthAsync(new FailOnAnyAlert(), new IPingClient[] { new PingClientStub("One", HttpStatusCode.OK) });

    Assert.IsTrue(task.IsCompleted);
    Assert.IsNull(task.Exception);
}

[TestMethod]
public void CheckOneItemOnUnavailableRaisesWarning()
{
    AlertStub alert = new AlertStub();

    Task task = Watchdog.CheckHealthAsync(alert, new IPingClient[] { new PingClientStub("One", HttpStatusCode.ServiceUnavailable) });

    Assert.IsTrue(task.IsCompleted);
    Assert.IsNull(task.Exception);
    Assert.AreEqual(1, alert.Warnings.Count);
    Assert.AreEqual(Tuple.Create("One", 503), alert.Warnings[0]);
}

[TestMethod]
public void CheckOneItemOnAnyOtherStatusRaisesError()
{
    AlertStub alert = new AlertStub();

    Task task = Watchdog.CheckHealthAsync(alert, new IPingClient[] { new PingClientStub("One", HttpStatusCode.BadRequest) });

    Assert.IsTrue(task.IsCompleted);
    Assert.IsNull(task.Exception);
    Assert.AreEqual(1, alert.Errors.Count);
    Assert.AreEqual(Tuple.Create("One", 400), alert.Errors[0]);
}

For completeness we would also want to verify an array with more than one item, so let’s try with two clients; if we want to be exhaustive we can cover again the three sub-cases from the single item case but maybe it’s okay to just test a different outcome for each item in a single test:

[TestMethod]
public void CheckTwoItemsRaisesErrorAndWarningAccordingToEachStatus()
{
    AlertStub alert = new AlertStub();
    IPingClient[] clients = new IPingClient[]
    {
        new PingClientStub("One", HttpStatusCode.BadRequest),
        new PingClientStub("Two", HttpStatusCode.ServiceUnavailable)
    };

    Task task = Watchdog.CheckHealthAsync(alert, clients);

    Assert.IsTrue(task.IsCompleted);
    Assert.IsNull(task.Exception);
    Assert.AreEqual(1, alert.Errors.Count);
    Assert.AreEqual(Tuple.Create("One", 400), alert.Errors[0]);
    Assert.AreEqual(1, alert.Warnings.Count);
    Assert.AreEqual(Tuple.Create("Two", 503), alert.Warnings[0]);
}

Whew. We’re done, right? Unfortunately, as you do a bit more experimentation, you realize that PingClient can actually throw an exception in some cases, e.g. when the network address cannot be resolved. The code as is will fail on first error without raising any sort of alert which is undesirable. That means we need another set of tests to account for what should happen instead to guide the right set of fixes:

// Full code elided for brevity...
[TestMethod]
public void CheckOneItemIfThrowsWebExceptionRaisesError()
{
    // ... Set up one client to throw WebException
    // ... Verify that one error was raised with code -1
}

// This test would ensure that we don't fail early on exceptions which was the bad side effect
// of the naive implementation and only needed because we have a loop...
[TestMethod]
public void CheckTwoItemsFirstThrowWebExceptionRaisesErrorSecondReturnUnvailableRaiseWarning()
{
    // ... Set up two clients, first throw WebException, second return warning status
    // ... Verify that one error was raised with code -1 and one warning was raised with 503
}

Okay, that’s the last of it… but wait, you learn that the alerting infrastructure can also throw (e.g. when a backlog of alerts has built up)! Now you need more tests to ensure that throwing later on won’t cause early failure (again, a side effect of using a loop) and that a fallback behavior is invoked instead:

[TestMethod]
public void CheckOneItemIfAlertThrowsInvalidOperationRaisesError()
{
    // ...
}

// Another test would ensure that we don't fail early on exceptions, only needed because
// of the loop...
[TestMethod]
public void CheckTwoItemsFirstIfAlertThrowsDoesNotSkipSecond()
{
    // ...
}

That’s surely the end, right? Well, the service architect is now complaining that the ping checks are serialized and would actually work better if they were parallelized to some degree. By their estimation, it should be possible to do the pings in batches of up to four at a time. At this point, it should be clear that the loop is a liability and is causing all sorts of required but not-very-relevant-to-a-watchdog tests to emerge.

What went wrong? Well, the biggest issue is that this for loop is essentially a hard-coded workflow sequence and error handling strategy. If we lift the loop outside the health check code, we can better focus on the matter at hand, which is the process for converting ping responses to alerts. Then we are free to define the sequencing and error handling orthogonally as a separate responsibility. Imagine instead that we did this:

// Generic async workflow sequence and exception aggregation
public static async Task ForEachAsync<TInput>(
    Func<TInput, Task> doAsync,
    IEnumerable<TInput> inputs)
{
    List<Exception> exceptions = new List<Exception>();
    foreach (TInput input in inputs)
    {
        try
        {
            await doAsync(input);
        }
        catch (Exception e)
        {
            exceptions.Add(e);
        }
    }

    if (exceptions.Count > 0)
    {
        throw new AggregateException(exceptions);
    }
}

// SINGLE ping check
public static async Task CheckHealthAsync(IAlert alert, IPingClient client)
{
    HttpStatusCode status;
    try
    {
        status = await client.PingAsync();
    }
    catch (WebException)
    {
        status = (HttpStatusCode)(-1);
    }

    // ... more code
}

Now we are free to write a separate set of simple unit tests for the sequencing/error handling via ForEachAsync, totally decoupled from the logic in CheckHealthAsync (which now would only have tests relating to a single ping client). And we are free to combine this logic as we see fit in an upper layer, for example:

public static Task DoAllPingsAsync(IAlert alert, IEnumerable<IPingClient> clients)
{
    return ForEachAsync(c => CheckHealthAsync(alert, c), clients);
}

This integration code in DoAllPingsAsync is straightforward enough that we shouldn’t even feel compelled to test it — all the actual logic is already fully tested. And now to fulfill the architect’s request, we could do a simple switch like so:

// Generic *batched* async workflow sequence and exception aggregation
public static async Task ForEachBatchAsync<TInput>(
    int batchSize,
    Func<TInput, Task> doAsync,
    IEnumerable<TInput> inputs)
{
    // ... code here
}

public static Task DoAllPings(IAlert alert, IEnumerable<IPingClient> clients)
{
    return ForEachBatchAsync(4, c => CheckHealthAsync(alert, c), clients);
}

This would cause zero impact to the ping client tests and would require only a new top-level function and tests for the batched workflow — no ripple effect, and one fewer reason to change!

Try lifting loops outside of your business logic. You might be surprised at the simplifications that result.

WriteAsync .NET

Testing, coding, in that order.

One thought on “A loop is a responsibility”

Leave a Reply Cancel reply