Async holes: ZipArchive

Spread the love

From time to time, I encounter unexpected gaps in asynchronous object models in .NET. I’ve taken to calling these “async holes” since they usually present an unpleasant obstacle in the clear path I try to follow while executing the TDD + async workflow. Today I will share an async hole I fell into recently on ZipArchive.

System.IO.Compression.ZipArchive is test-friendly, for the most part. Given that it only requires a Stream to access the underlying archive, you are free to pass, say, a MemoryStream if you are trying to avoid file system operations. However, be aware that even if you construct a ZipArchive from a MemoryStream, all of the async operations will use the thread pool. To see for yourself, compile and run the ZipUnzipAsync method in this code snippet:

private static async Task ZipUnzipAsync()
{
    Encoding encoding = Encoding.UTF8;
    string text = "hello!";

    Log("Zipping '{0}'...", text);
    byte[] output = await ZipAsync(text, encoding);

    Log("Unzipping output ({0} bytes)...", output.Length);
    string result = await UnzipAsync(output, encoding);

    Log("Unzipped result: '{0}'", result);
}

private static void Log(string format, params object[] args)
{
    string text = string.Format(CultureInfo.InvariantCulture, format, args);
    Console.WriteLine("[{0}] {1}", Thread.CurrentThread.ManagedThreadId, text);
}

private static async Task<byte[]> ZipAsync(string text, Encoding encoding)
{
    MemoryStream outputZip = new MemoryStream();
    using (ZipArchive zip = new ZipArchive(outputZip, ZipArchiveMode.Create))
    {
        ZipArchiveEntry entry = zip.CreateEntry("one.txt");
        using (Stream entryStream = entry.Open())
        {
            byte[] bytes = encoding.GetBytes(text);
            await entryStream.WriteAsync(bytes, 0, bytes.Length);
        }
    }

    return outputZip.ToArray();
}

On my system, the output is:

[1] Zipping 'hello!'...
[3] Unzipping output (120 bytes)...
[4] Unzipped result: 'hello!'

You can see that we have a thread switch after each call to the inner zip and unzip methods. Given that there is only one await inside those methods, the thread switch must be occurring due to the WriteAsync and ReadAsync stream methods.

So, why does this happen? We can get a hint by looking at the .NET CoreFX implementation of ZipArchiveEntry. There we see the incoming stream is wrapped in another stream — or more accurately multiple levels of streams, with WrappedStream or DeflateStream at the top for zip or unzip, respectively. As of .NET 4.7.1, neither of these streams implement the async methods, so the “failsafe” implementation on the core Stream class kicks in and pushes the blocking calls to Read or Write to the thread pool.

As a side note, .NET Core 2.0 has improved this situation a bit, given that it does have async overrides on DeflateStream. Running the same app code shown above will thus eliminate the last thread switch on the unzip call (the zip portion still runs on the thread pool, alas). In any case, exercise caution in your unit tests, and make sure to handle the thread switch with an appropriate call to Wait() or .Result.

One thought on “Async holes: ZipArchive

  1. Pingback: Async holes: StringContent – WriteAsync .NET

Leave a Reply

Your email address will not be published. Required fields are marked *