Microbenchmarks with BenchmarkDotNet

Spread the love

They say that the fastest prime number algorithm for small inputs (up to 216) is a lookup table. But this got me thinking — would it be faster to use HashSet.Contains or a List.BinarySearch? The answer to this type of performance comparison question has been debated often, usually with no clear conclusion. What is a programmer to do?

This calls for a microbenchmark! Designing and collecting a proper benchmark is difficult to do properly and accurately, but we are in luck — BenchmarkDotNet has done all the hard work for us. BenchmarkDotNet is a simple benchmark library which happens to be a favorite of the CoreFX team, being used to justify major performance improvements to the framework itself.

Let’s use BenchmarkDotNet to answer the specific question above. First, we need a list of primes up to 65535. WolframAlpha has us covered here; using the “Plaintext” link, we can even get the list in a convenient-for-code-copy-paste form with braces and commas!

Now we need to create a class with the code we want to benchmark, using a few attributes:

namespace BenchmarkSample
{
    using System.Collections.Generic;
    using BenchmarkDotNet.Attributes;

    [CoreJob]
    public class PrimeLookup
    {
        private static readonly List<ushort> PrimeList = new List<ushort> { 2, 3, 5, /* ... */, 65519, 65521 };

        private static HashSet<ushort> PrimeSet = new HashSet<ushort>(PrimeList);

        [Benchmark]
        public ushort GetCountBinarySearch()
        {
            ushort count = 0;
            for (int i = 0; i < ushort.MaxValue; ++i)
            {
                if (GetOneBinarySearch((ushort)i))
                {
                    ++count;
                }
            }

            return count;
        }

        [Benchmark]
        public ushort GetCountHashSet()
        {
            ushort count = 0;
            for (int i = 0; i < ushort.MaxValue; ++i)
            {
                if (GetOneHashSet((ushort)i))
                {
                    ++count;
                }
            }

            return count;
        }

        private static bool GetOneBinarySearch(ushort n) => PrimeList.BinarySearch(n) >= 0;

        private static bool GetOneHashSet(ushort n) => PrimeSet.Contains(n);
    }
}

You can see that we have two basic algorithms being measured here. The first GetCountBinarySearch counts the number of primes from 0 to 65535 using a lookup table via List.BinarySearch. The second does the same but with a HashSet.Contains lookup instead. The [CoreJob] attribute indicates that we are only interested in measuring performance with .NET Core (rather than the full .NET Framework, which one can specify with [ClrJob]). You might wonder why the code doesn’t use more static members; this is because the benchmark class cannot be sealed or static and must be default-constructible with benchmark methods as instance methods. (BenchmarkDotNet internally generates code for the benchmark which uses inheritance and instances of your class.)

Before we actually measure, we need to write some tests. It doesn’t make much sense to benchmark an incorrect algorithm. This should do it:

namespace BenchmarkSample.Test
{
    using Microsoft.VisualStudio.TestTools.UnitTesting;

    [TestClass]
    public sealed class PrimeLookupTest
    {
        [TestMethod]
        public void CountHashSet()
        {
            Assert.AreEqual(6542, new PrimeLookup().GetCountHashSet());
        }

        [TestMethod]
        public void CountBinarySearch()
        {
            Assert.AreEqual(6542, new PrimeLookup().GetCountBinarySearch());
        }
    }
}

The tests pass, so let’s write the entry point to launch the benchmark:

namespace BenchmarkSample
{
    using BenchmarkDotNet.Running;

    internal static class Program
    {
        private static void Main()
        {
            BenchmarkRunner.Run<PrimeLookup>();
        }
    }
}

That’s it! The attributes have already specified what we want to do and there are smart defaults for the common case.

Now we just need to set the build to Release mode (very important!) and run this experiment. You’ll see it generate a project, compile it, and then launch a benchmark session to gather statistics (automatically doing warmup iterations, etc.). The last chunk of output gives us the summary:

// * Summary *

BenchmarkDotNet=v0.11.0, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-3960X CPU 3.30GHz (Ivy Bridge), 1 CPU, 12 logical and 6 physical cores
Frequency=3224539 Hz, Resolution=310.1218 ns, Timer=TSC
.NET Core SDK=2.1.201
  [Host] : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT
  Core   : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT

Job=Core  Runtime=Core

               Method |     Mean |     Error |    StdDev |
--------------------- |---------:|----------:|----------:|
 GetCountBinarySearch | 3.405 ms | 0.0528 ms | 0.0494 ms |
      GetCountHashSet | 1.394 ms | 0.0146 ms | 0.0137 ms |

HashSet has emerged as the clear winner, at least in this experiment with ~6000 numeric items. Remember that this is a microbenchmark and we cannot extrapolate the results too far.

Why not give BenchmarkDotNet a try? It could definitely help make your .NET performance discussions less about gut feeling and more data-driven.

2 thoughts on “Microbenchmarks with BenchmarkDotNet

  1. Pingback: Operation throttle: part 1 – WriteAsync .NET

  2. Pingback: Letter Boxed: optimizing the trie – WriteAsync .NET

Leave a Reply

Your email address will not be published. Required fields are marked *