TKey performance indicators: custom types

Spread the love

Last time, we covered the effect of various primitive types (and string) for use as dictionary keys. Today we will see how custom types fare.

Let’s start with a custom value type which simply wraps a string:

    public struct StringV
    {
        private readonly string s;

        public StringV(string s)
        {
            this.s = s;
        }
    }

We’ll use a very similar benchmark as before, but this time we will constrain the number of keys to 100 for simplicity. Here is the code:

[CoreJob]
[MemoryDiagnoser]
public class DictKeys
{
    private List<StringV> strV8Keys;
    private List<string> str8Keys;

    [Params(100)]
    public int N { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        this.strV8Keys = Setup(this.N, i => new StringV(TwoChars(i, 4)));
        this.str8Keys = Setup(this.N, i => TwoChars(i, 4));
    }

    [Benchmark]
    public int StrV8K() => Run(this.strV8Keys).Count;

    [Benchmark]
    public int Str8K() => Run(this.str8Keys).Count;

    private static List<T> Setup<T>(int n, Func<int, T> make)
    {
        List<T> k = new List<T>();
        for (int i = 0; i < n; ++i)
        {
            k.Add(make(i));
        }

        return k;
    }

    private static string TwoChars(int val, int n)
    {
        StringBuilder sb = new StringBuilder();
        char one = (char)('0' + (val / 32));
        char two = (char)('0' + (val % 32));
        for (int i = 0; i < n; ++i)
        {
            sb.Append(one);
            sb.Append(two);
        }

        return sb.ToString();
    }

    private static Dictionary<T, int> Run<T>(List<T> k)
    {
        Dictionary<T, int> dict = new Dictionary<T, int>();
        int n = k.Count;
        for (int i = 0; i < n; ++i)
        {
            dict[k[i]] = i;
        }

        return dict;
    }
}

The results:

| Method |   N |      Mean |     Error |    StdDev |    Median |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|------- |---- |----------:|----------:|----------:|----------:|-------:|-------:|------:|----------:|
| StrV8K | 100 | 16.160 us | 0.2160 us | 0.2020 us | 16.051 us | 1.9836 | 0.0305 |     - |   12.3 KB |
|  Str8K | 100 |  5.678 us | 0.1118 us | 0.2258 us |  5.586 us | 1.6174 | 0.0381 |     - |   9.95 KB |

What?! StringV is 3X slower on identical inputs. I have actually talked about this phenomenon before (in the context of the optimized Letter Boxed trie). To summarize, a value type used as a dictionary key should always override GetHashCode and implement IEquatable. Not only will this improve performance in this particular case, but it will avoid correctness issues where the built-in equality and hash code functions might not do the right thing (e.g. if you want to exclude some fields from equality comparisons).

Let’s reimplement StringV as recommended:

public struct StringV : IEquatable<StringV>
{
    private readonly string s;

    public StringV(string s)
    {
        this.s = s;
    }

    public bool Equals(StringV other)
    {
        return string.Equals(this.s, other.s, StringComparison.Ordinal);
    }

    public override int GetHashCode()
    {
        if (this.s == null)
        {
            return 0;
        }

        return this.s.GetHashCode();
    }
}

We don’t need or want anyone to use the object.Equals method (it will require an expensive boxing operation) so we do not override it here. Now let’s check the revised performance results:

| Method |   N |     Mean |     Error |    StdDev |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|------- |---- |---------:|----------:|----------:|-------:|-------:|------:|----------:|
| StrV8K | 100 | 5.502 us | 0.0370 us | 0.0309 us | 1.6174 | 0.0381 |     - |   9.95 KB |
|  Str8K | 100 | 5.690 us | 0.1136 us | 0.2243 us | 1.6174 | 0.0381 |     - |   9.95 KB |

Within a margin of error, the performance is now identical in both memory and CPU. Success!

Now, let’s move on to the scenario I alluded to in the previous post, where subtle performance issues can lurk in unexpected places. Suppose that the StringV type is intended to hold a raw representation of the string in UTF-8 encoding. This might be useful to ensure a fast binary serialization strategy without the need to reallocate/re-decode the string from .NET’s internal UTF-16 representation. The new struct might look like this:

public struct StringV : IEquatable<StringV>
{
    private readonly byte[] raw;
    private readonly string s;

    public StringV(byte[] raw)
    {
        this.raw = raw;
        this.s = Encoding.UTF8.GetString(this.raw);
    }

    public byte[] ToByteArray() => this.raw;

    public bool Equals(StringV other)
    {
        return string.Equals(this.s, other.s, StringComparison.Ordinal);
    }

    public override int GetHashCode()
    {
        if (this.s == null)
        {
            return 0;
        }

        return this.s.GetHashCode();
    }
}

Instead of accepting a string directly, the raw byte buffer is passed and then decoded once to create the internal string. A new ToByteArray method is added to provide access the buffer. Generally, though, the behavior with respect to the dictionary key logic is identical — no changes to Equals or GetHashCode.

One minor change is needed to make the benchmark work, since the StringV constructor has changed:

            this.strV8Keys = Setup(this.N, i => new StringV(Encoding.UTF8.GetBytes(TwoChars(i, 4))));

Nothing should have changed performance-wise, right?

| Method |   N |     Mean |     Error |    StdDev |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|------- |---- |---------:|----------:|----------:|-------:|-------:|------:|----------:|
| StrV8K | 100 | 7.120 us | 0.0689 us | 0.0576 us | 2.0599 | 0.0610 |     - |  12.69 KB |
|  Str8K | 100 | 5.466 us | 0.0371 us | 0.0290 us | 1.6174 | 0.0381 |     - |   9.95 KB |

Wrong! We have just made the code 30% slower, with 27% more memory allocated. What gives?

If you were following along in the previous discussion about memory and alignment, one thing that should be apparent is that StringV is now twice as big as it used to be — two reference type fields (8 x 2 = 16 bytes) instead of one. This means the dictionary will use (and copy around) a lot more memory just for the keys in its entries array. Could there be other factors at work as well? Given the bigger memory sizes, you might suspect CPU cache performance will suffer.

Rather than guess, let’s check this theory by enabling HardwareCounters.CacheMisses in BenchmarkDotNet:

    [CoreJob]
    [MemoryDiagnoser]
    [HardwareCounters(HardwareCounter.CacheMisses)]
    public class DictKeys
    {
// . . .

Note that we will have to run this benchmark elevated now. The results:

| Method |   N |     Mean |     Error |    StdDev |  Gen 0 |  Gen 1 | Gen 2 | Allocated | CacheMisses/Op |
|------- |---- |---------:|----------:|----------:|-------:|-------:|------:|----------:|---------------:|
| StrV8K | 100 | 7.307 us | 0.1459 us | 0.2314 us | 2.0599 | 0.0610 |     - |  12.69 KB |              3 |
|  Str8K | 100 | 5.402 us | 0.0308 us | 0.0273 us | 1.6174 | 0.0381 |     - |   9.95 KB |              2 |

As suspected, there are definitely more cache misses in the StringV case. (The number says 50% more, 3 vs. 2, but we would probably have to run many more iterations to get a more precise figure.)

The lesson here is that types used as dictionary keys should not have extraneous data. Where your design instincts might push you in this direction (e.g. avoiding primitive obsession, implementing whole value pattern, etc.), your performance instincts may need to provide a counterbalance.

One thought on “TKey performance indicators: custom types

  1. Jie Mei

    These two blogs illustrate a very good example of how to approach a performance issue, which is very helpful.

    It is also interesting to learn the fact that overrides GetHashCode and implements IEquatable can improve performance.

Leave a Reply

Your email address will not be published. Required fields are marked *