{"id":5655,"date":"2019-07-29T13:00:56","date_gmt":"2019-07-29T13:00:56","guid":{"rendered":"http:\/\/writeasync.net\/?p=5655"},"modified":"2019-07-29T02:49:26","modified_gmt":"2019-07-29T02:49:26","slug":"tkey-performance-indicators-primitives","status":"publish","type":"post","link":"https:\/\/writeasync.net\/?p=5655","title":{"rendered":"TKey performance indicators: primitives"},"content":{"rendered":"<p>Recently I encountered a situation where a seemingly innocuous change in the structure of a <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.collections.generic.dictionary-2?view=netframework-4.8\">dictionary key<\/a> had a noticeable negative performance impact (more on this later). This got me thinking, how exactly do different types of keys measure up in a head-to-head comparison? Let&#8217;s find out!<\/p>\n<p>First, we need to design an appropriate benchmark. (Note that we&#8217;ll be using <a href=\"http:\/\/writeasync.net\/?p=5514\">BenchmarkDotNet<\/a>, of course.) The basic flow will be as follows:<\/p>\n<ol>\n<li>Pick a dictionary key type.<\/li>\n<li>Generate some number of unique keys as a one-time setup step.<\/li>\n<li>Create a dictionary of <code>int<\/code>s and insert a value for each previously generated key.<\/li>\n<\/ol>\n<p>For the first set of measurements, I have chosen the following key types: <code>byte<\/code>, <code>short<\/code>, <code>int<\/code>, <code>long<\/code>, and finally three variations of <code>string<\/code> (using lengths of 2, 8, and 32 <code>char<\/code>s respectively). The benchmark code looks like this:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n&#x5B;CoreJob]\r\n&#x5B;MemoryDiagnoser]\r\npublic class DictKeys\r\n{\r\n    private List&lt;byte&gt; byteKeys;\r\n    private List&lt;short&gt; shortKeys;\r\n    private List&lt;int&gt; intKeys;\r\n    private List&lt;long&gt; longKeys;\r\n    private List&lt;string&gt; str2Keys;\r\n    private List&lt;string&gt; str8Keys;\r\n    private List&lt;string&gt; str32Keys;\r\n\r\n    &#x5B;Params(10, 100, 1000)]\r\n    public int N { get; set; }\r\n\r\n    &#x5B;GlobalSetup]\r\n    public void Setup()\r\n    {\r\n        this.byteKeys = Setup(this.N, i =&gt; (byte)i);\r\n        this.shortKeys = Setup(this.N, i =&gt; (short)i);\r\n        this.intKeys = Setup(this.N, i =&gt; i);\r\n        this.longKeys = Setup(this.N, i =&gt; (long)i);\r\n        this.str2Keys = Setup(this.N, i =&gt; TwoChars(i, 1));\r\n        this.str8Keys = Setup(this.N, i =&gt; TwoChars(i, 4));\r\n        this.str32Keys = Setup(this.N, i =&gt; TwoChars(i, 16));\r\n    }\r\n\r\n    &#x5B;Benchmark]\r\n    public int ByteK() =&gt; Run(this.byteKeys).Count;\r\n\r\n    &#x5B;Benchmark]\r\n    public int ShortK() =&gt; Run(this.shortKeys).Count;\r\n\r\n    &#x5B;Benchmark]\r\n    public int IntK() =&gt; Run(this.intKeys).Count;\r\n\r\n    &#x5B;Benchmark]\r\n    public int LongK() =&gt; Run(this.longKeys).Count;\r\n\r\n    &#x5B;Benchmark]\r\n    public int Str2K() =&gt; Run(this.str2Keys).Count;\r\n\r\n    &#x5B;Benchmark]\r\n    public int Str8K() =&gt; Run(this.str8Keys).Count;\r\n\r\n    &#x5B;Benchmark]\r\n    public int Str32K() =&gt; Run(this.str32Keys).Count;\r\n\r\n    private static List&lt;T&gt; Setup&lt;T&gt;(int n, Func&lt;int, T&gt; make)\r\n    {\r\n        List&lt;T&gt; k = new List&lt;T&gt;();\r\n        for (int i = 0; i &lt; n; ++i)\r\n        {\r\n            k.Add(make(i));\r\n        }\r\n\r\n        return k;\r\n    }\r\n\r\n    private static string TwoChars(int val, int n)\r\n    {\r\n        StringBuilder sb = new StringBuilder();\r\n        char one = (char)('0' + (val \/ 32));\r\n        char two = (char)('0' + (val % 32));\r\n        for (int i = 0; i &lt; n; ++i)\r\n        {\r\n            sb.Append(one);\r\n            sb.Append(two);\r\n        }\r\n\r\n        return sb.ToString();\r\n    }\r\n\r\n    private static Dictionary&lt;T, int&gt; Run&lt;T&gt;(List&lt;T&gt; k)\r\n    {\r\n        Dictionary&lt;T, int&gt; dict = new Dictionary&lt;T, int&gt;();\r\n        int n = k.Count;\r\n        for (int i = 0; i &lt; n; ++i)\r\n        {\r\n            dict&#x5B;k&#x5B;i]] = i;\r\n        }\r\n\r\n        return dict;\r\n    }\r\n}\r\n<\/pre>\n<p>For the setup step, the integer value is used as-is for the key in all cases except for <code>string<\/code>. The string keys are a &#8220;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Base32\">Base32<\/a>&#8220;-ish encoding of the N parameter value repeated to create the desired length; for example, the first few keys of length 8 are 00000000, 01010101, 02020202, etc. Really, the values don&#8217;t matter much, as long as they are unique.<\/p>\n<p>Let&#8217;s see the results:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n| Method |    N |        Mean |        Error |       StdDev |      Median |   Gen 0 |  Gen 1 | Gen 2 | Allocated |\r\n|------- |----- |------------:|-------------:|-------------:|------------:|--------:|-------:|------:|----------:|\r\n|  ByteK |   10 |    332.8 ns |     3.626 ns |     3.214 ns |    331.5 ns |  0.1230 |      - |     - |     776 B |\r\n| ShortK |   10 |    335.7 ns |     1.480 ns |     1.236 ns |    335.4 ns |  0.1230 |      - |     - |     776 B |\r\n|   IntK |   10 |    332.3 ns |     3.183 ns |     2.485 ns |    331.7 ns |  0.1230 |      - |     - |     776 B |\r\n|  LongK |   10 |    350.5 ns |     1.542 ns |     1.367 ns |    350.3 ns |  0.1574 |      - |     - |     992 B |\r\n|  Str2K |   10 |    588.8 ns |    11.707 ns |    23.381 ns |    575.6 ns |  0.1574 |      - |     - |     992 B |\r\n|  Str8K |   10 |    619.0 ns |     1.886 ns |     1.575 ns |    618.8 ns |  0.1574 |      - |     - |     992 B |\r\n| Str32K |   10 |    799.9 ns |     6.069 ns |     4.739 ns |    799.8 ns |  0.1574 |      - |     - |     992 B |\r\n|  ByteK |  100 |  2,675.2 ns |    16.033 ns |    13.388 ns |  2,675.0 ns |  1.1711 | 0.0191 |     - |    7392 B |\r\n| ShortK |  100 |  2,906.2 ns |    20.842 ns |    16.272 ns |  2,904.8 ns |  1.1711 | 0.0191 |     - |    7392 B |\r\n|   IntK |  100 |  2,728.3 ns |     3.704 ns |     2.892 ns |  2,728.2 ns |  1.1711 | 0.0191 |     - |    7392 B |\r\n|  LongK |  100 |  3,074.8 ns |    61.137 ns |   111.792 ns |  3,039.5 ns |  1.6174 | 0.0381 |     - |   10192 B |\r\n|  Str2K |  100 |  4,831.0 ns |    14.509 ns |    12.116 ns |  4,831.0 ns |  1.6174 | 0.0381 |     - |   10192 B |\r\n|  Str8K |  100 |  5,666.8 ns |   112.301 ns |   196.686 ns |  5,707.1 ns |  1.6174 | 0.0381 |     - |   10192 B |\r\n| Str32K |  100 |  7,494.3 ns |    46.232 ns |    43.245 ns |  7,477.0 ns |  1.6174 | 0.0381 |     - |   10192 B |\r\n|  ByteK | 1000 | 18,044.9 ns |   347.597 ns |   325.142 ns | 18,233.2 ns |  2.5330 | 0.0610 |     - |   16064 B |\r\n| ShortK | 1000 | 29,562.3 ns |   186.980 ns |   156.137 ns | 29,617.7 ns | 11.5967 | 1.6479 |     - |   73168 B |\r\n|   IntK | 1000 | 26,968.4 ns |   531.823 ns |   958.986 ns | 26,501.7 ns | 11.5967 | 1.6479 |     - |   73168 B |\r\n|  LongK | 1000 | 30,206.0 ns |   600.060 ns | 1,170.372 ns | 29,567.4 ns | 16.1743 | 2.6855 |     - |  102216 B |\r\n|  Str2K | 1000 | 47,720.6 ns |   215.247 ns |   201.342 ns | 47,639.7 ns | 16.1743 | 2.6855 |     - |  102216 B |\r\n|  Str8K | 1000 | 59,098.2 ns | 1,325.516 ns | 1,901.015 ns | 58,183.4 ns | 16.1743 | 2.6855 |     - |  102216 B |\r\n| Str32K | 1000 | 76,835.0 ns |   383.037 ns |   339.552 ns | 76,724.6 ns | 16.1133 | 2.5635 |     - |  102216 B |\r\n<\/pre>\n<p>Some of the results do not make sense at first. For instance, why is <code>short<\/code> always slower than <code>int<\/code>? I can&#8217;t think of a clear reason, but the performance effect is definitely consistent across all the cases.<\/p>\n<p>Other non-obvious results seem strange but can be explained more readily. Take for example the fact that the allocation sizes are essentially equivalent for all the integral types excluding <code>long<\/code>. This I am pretty sure is due to <a href=\"https:\/\/developer.ibm.com\/articles\/pa-dalign\/\">memory alignment<\/a> requirements. If you look at <a href=\"https:\/\/github.com\/dotnet\/corefx\/blob\/master\/src\/Common\/src\/CoreLib\/System\/Collections\/Generic\/Dictionary.cs\">the internals of Dictionary<\/a> you can see an entry struct that holds each key\/value pair, defined roughly like this:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n        struct Entry\r\n        {\r\n            public int next;\r\n            public uint hashCode;\r\n            public TKey key;\r\n            public TValue value;\r\n        }\r\n<\/pre>\n<p>In the benchmark, <code>TValue<\/code> is always <code>int<\/code>, and <code>TKey<\/code> varies in size (K = 1, 2, 4, or 8 bytes). Na\u00efvely, we would then expect the struct to be (4 + 4 + K + 4) = 12 + K bytes long. In reality, <a href=\"https:\/\/stackoverflow.com\/questions\/9741395\/alignment-of-arrays-in-net\/9741597#9741597\">the CLR has alignment requirements<\/a> for these built-in types, specifically that 4-byte values have 4-byte alignment (i.e. stored at a memory address divisible by 4). For all the <code>int<\/code>s to be aligned, the layout would need to be:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n         +---------------------------------+\r\n         | 0 1 2 3 4 5 6 7 8 9 A B C D E F |\r\n         +-^-------^-------^-------^-------+\r\n---------+ |       |       |       |\r\nint next  _|       |       |       |\r\nuint hashCode  ____|       |       |\r\nTKey key  _________________|       |\r\nint value  ________________________|\r\n---------+\r\n<\/pre>\n<p>Even if <code>TKey<\/code> is <code>byte<\/code>, the following <code>int value<\/code> field must land on the next multiple of 4 address, giving us a <strong>minimum size requirement of 16<\/strong> for the struct. That being said, the N=1000 case for <code>Dictionary&lt;byte, int&gt;<\/code> uses far less memory than the others, which I can&#8217;t explain yet (perhaps something to do with the resizing strategy and how the entries are laid out?).<\/p>\n<p>The rest of the results are fairly mundane. We can see that <code>long<\/code> and <code>string<\/code> both have the same size characteristics. This is because they are both <a href=\"https:\/\/stackoverflow.com\/questions\/3800882\/how-big-is-an-object-reference-in-net\">8 bytes on 64-bit architectures<\/a>. In fact, a field of any reference type, <code>string<\/code> included, will be 8-bytes, no matter how big the actual &#8220;pointed to&#8221; memory is. (Remember that we pre-allocated the keys in our benchmark, so we only pay for the reference, and not the string data itself.)<\/p>\n<p>Furthermore, longer <code>string<\/code> keys cost more in CPU time. This is because the <code>GetHashCode<\/code> implementation for <code>string<\/code> considers all characters of the string when hashing, making it <a href=\"https:\/\/stackoverflow.com\/questions\/25939407\/what-is-the-time-complexity-of-string-gethashcode\">roughly <code>O(N)<\/code> for input size <code>N<\/code><\/a>.<\/p>\n<p>This is enough performance goodness to chew on for now. Next, we&#8217;ll explore the implications of using custom data types for keys.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I encountered a situation where a seemingly innocuous change in the structure of a dictionary key had a noticeable negative performance impact (more on this later). This got me thinking, how exactly do different types of keys measure up&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[104],"tags":[],"class_list":["post-5655","post","type-post","status-publish","format-standard","hentry","category-performance"],"_links":{"self":[{"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5655","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5655"}],"version-history":[{"count":3,"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5655\/revisions"}],"predecessor-version":[{"id":5658,"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5655\/revisions\/5658"}],"wp:attachment":[{"href":"https:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5655"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5655"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5655"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}