SSD and … sync I/O?

Walter Bright (via Andrei Alexandrescu) says, “Measuring gives you a leg up on experts who are too good to measure.” Today I’ll present some measurements that might be a bit surprising.

In the old days of mechanical spinning disks, the mantra was keep the queue length low. In other words, don’t hammer your hard drive with a lot of concurrent I/O requests. The landscape has completely changed with the advent of solid state technology. Fast disks don’t spin. Further, their full bandwidth potential is unlocked only in the presence of a high queue depth.

So let’s take a look at a simple sequential write workload simulation in .NET.

namespace DiskPerf
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.IO;
    using System.Threading;
    using System.Threading.Tasks;

    internal sealed class Program
    {
        private static void Main(string[] args)
        {
            int taskCount = 1;
            bool useAsync = true;
            string rootPath = "C:\\Temp\\DiskPerf";

            List<DiskWriteTask> tasks = new List<DiskWriteTask>();

            for (int i = 0; i < taskCount; ++i)
            {
                string folderPath = rootPath + "\\" + Guid.NewGuid().ToString("N");
                Directory.CreateDirectory(folderPath);
                DiskWriteTask task = new DiskWriteTask((int)Stopwatch.GetTimestamp(), folderPath);
                tasks.Add(task);
            }

            Console.WriteLine("Press ENTER to start.");
            Console.ReadLine();

            foreach (DiskWriteTask task in tasks)
            {
                task.Start(useAsync);
            }

            Console.WriteLine("Press ENTER to stop.");
            Console.ReadLine();

            foreach (DiskWriteTask task in tasks)
            {
                task.Stop();
            }

            foreach (DiskWriteTask task in tasks)
            {

            }
        }

        private sealed class DiskWriteTask
        {
            private readonly List<byte[]> cachedBuffers;
            private readonly string path;
            private readonly int seed;

            private CancellationTokenSource cts;
            private Task task;
            private FileStream stream;
            private string fileName;
            private int currentBuffer;

            public DiskWriteTask(int seed, string path)
            {
                this.path = path;
                this.cachedBuffers = new List<byte[]>();
                this.seed = seed;
                this.BufferSize = 65536;
                this.FileSize = 1048576;
                this.CachedBufferCount = 32;
                this.MaxFileCount = 64;
            }

            public int BufferSize { get; set; }

            public int CachedBufferCount { get; set; }
            
            public int FileSize { get; set; }

            public int MaxFileCount { get; set; }

            public int CurrentFileCount { get; set; }

            public void Start(bool useAsync)
            {
                this.CacheBuffers();
                this.cts = new CancellationTokenSource();
                if (useAsync)
                {
                    this.task = this.RunAsync(this.cts.Token);
                }
                else
                {
                    this.task = Task.Factory.StartNew(() => this.RunSync(this.cts.Token), TaskCreationOptions.LongRunning);
                }
            }

            public void Stop()
            {
                using (this.cts)
                {
                    this.cts.Cancel();
                    this.task.Wait();
                }
            }

            private void CacheBuffers()
            {
                Random random = new Random(this.seed);
                this.cachedBuffers.Clear();
                for (int i = 0; i < this.CachedBufferCount; ++i)
                {
                    byte[] cachedBuffer = new byte[this.BufferSize];
                    random.NextBytes(cachedBuffer);
                    this.cachedBuffers.Add(cachedBuffer);
                }
            }

            private byte[] NextBuffer()
            {
                byte[] nextBuffer = this.cachedBuffers[this.currentBuffer];
                this.currentBuffer = (this.currentBuffer + 1) % this.CachedBufferCount;
                return nextBuffer;
            }

            private async Task RunAsync(CancellationToken token)
            {
                while (!token.IsCancellationRequested)
                {
                    this.fileName = this.path + "\\file" + (this.CurrentFileCount % this.MaxFileCount) + ".bin";
                    using (this.stream = new FileStream(this.fileName, FileMode.Create, FileAccess.Write, FileShare.None, this.BufferSize, true))
                    {
                        for (int i = this.FileSize / this.BufferSize; i > 0; i--)
                        {
                            await this.stream.WriteAsync(this.NextBuffer(), 0, this.BufferSize);
                        }
                    }

                    this.CurrentFileCount++;
                }
            }

            private void RunSync(CancellationToken token)
            {
                while (!token.IsCancellationRequested)
                {
                    this.fileName = this.path + "\\file" + (this.CurrentFileCount % this.MaxFileCount) + ".bin";
                    using (this.stream = new FileStream(this.fileName, FileMode.Create, FileAccess.Write, FileShare.None, this.BufferSize, false))
                    {
                        for (int i = this.FileSize / this.BufferSize; i > 0; i--)
                        {
                            this.stream.Write(this.NextBuffer(), 0, this.BufferSize);
                        }
                    }

                    ++this.CurrentFileCount;
                }
            }
        }
    }
}

Wait… what’s that RunSync method doing there? Isn’t async I/O the way to go? Well, let’s measure and find out.

The sync vs. async comparison uses all the same data values: 32 concurrent tasks, a buffer size of 64 KB, a file size of 1 MB, a cached buffer count of 32, and a max file count of 64 (after that, existing files will continually be overwritten). The only difference is whether we create the FileStream with useAsync true or false and use Write or WriteAsync. I would be remiss if I neglected to include basic hardware specs: Intel Core i7-3960X (6 cores + hyper-threading), 24 GB RAM,
Samsung SSD 830 Series (120 GB) SSD connected via SATA 2 (3Gb/s) with AHCI mode enabled.

And now, the results taken from PerfMon while running this app for a couple minutes:

Async tasks

Async SSD perf

Sync tasks

Sync32T65536B1048576F

Scandalous! The synchronous app in terms of I/O throughput easily outperformed the asynchronous one. What’s going on here?

For one thing, the write operations against the SSD complete very quickly (~10 ms) but not synchronously. They in fact always incur a thread switch since the completion task gets dispatched to the thread pool. Examining the thread count (not shown in the graph above) shows as many as 80 threads active in the process. All this context switching makes for a lot of overhead which just isn’t incurred in the sync app with its (more or less) fixed thread count of 32.

Does this mean that we should ditch async disk I/O in the SSD era? Certainly not! In the likely more common case where disk I/O is intermingled with several other tasks, it makes sense to stick with async. (Lest we forget, dedicated threads do not scale in the general case.) If on the other hand you are writing an extremely disk-intensive system application, you will probably already be thinking very carefully about data structures, allocation strategies, and low-level control flow which will result in several more significant design considerations beyond merely “Tasks vs. threads.”

Now this was by no means a clean room, ultra-scientific exercise. The background workload for example was minimal but not tightly controlled, C:\ (my SSD volume) happens to be my OS partition, I didn’t attempt to control for write caching, and so forth (though I did disable malware scanning in the target write folder to remove some more significant extraneous I/O). Even still, the results should be enough to convince the average developer to insist on a representative performance baseline with a known hardware profile before deciding on a sync or async workload.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.