{"id":5650,"date":"2019-06-09T13:00:01","date_gmt":"2019-06-09T13:00:01","guid":{"rendered":"http:\/\/writeasync.net\/?p=5650"},"modified":"2019-06-08T15:49:08","modified_gmt":"2019-06-08T15:49:08","slug":"an-even-faster-hash-table","status":"publish","type":"post","link":"http:\/\/writeasync.net\/?p=5650","title":{"rendered":"An even faster hash table"},"content":{"rendered":"<p>Last week, in my continuing exploration of a <a href=\"http:\/\/writeasync.net\/?p=5638\">C++ Letter Boxed solver<\/a>, I had to build a <a href=\"http:\/\/writeasync.net\/?p=5640\">faster hash table<\/a> to improve performance. Today we&#8217;ll make it even faster, hoping to catch up to the performance of my <a href=\"http:\/\/writeasync.net\/?p=5625\">.NET Core solver app<\/a>.<\/p>\n<p>Before making the code changes, I first needed to <a href=\"https:\/\/github.com\/brian-dot-net\/games\/commit\/a2913f1755e136d8fd95fedae6720784acf8a434\">fix the benchmark code<\/a>. The find operations were not really working the way they should have been. (Luckily the numbers didn&#8217;t change all that much.)<\/p>\n<p>Now, let&#8217;s start optimizing! First, note that <code>Hashtable<\/code> has a single vector for its buckets. Each entry looks like this:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nstruct Entry\r\n{\r\n    TKey key_;\r\n    TValue value_;\r\n    bool occupied_;\r\n};\r\n<\/pre>\n<p>This means that every time we resize the table, we have to reallocate and copy these keys and values again. Perhaps we can get a performance edge by using two vectors &#8212; one for buckets, which would just be a list of <code>int<\/code>s, and another for the entries which would never need to change other than appending totally new key-value pairs. The number in each bucket would refer to the index of the entry stored there.<\/p>\n<p>Since a <a href=\"https:\/\/stackoverflow.com\/questions\/13110130\/initialize-a-vector-to-zeros-c-c11\">vector zero-fills by default<\/a>, we should take advantage of that fact by treating a bucket value of zero as empty. Then we just need to be careful to <a href=\"http:\/\/wiki.c2.com\/?ZeroAndOneBasedIndexes\">one-index<\/a> the entry list by setting its initial size to one.<\/p>\n<p>Unfortunately, this approach forces us into a near total rewrite of the <code>Hashtable<\/code> class. But no matter &#8212; we have precious CPU cycles at stake! After a lot more logic errors than I care to admit, I was able to come up with these changes: <a href=\"https:\/\/github.com\/brian-dot-net\/games\/commit\/efa79c657bae6dfea28e5d13db5379b487617b41?diff=split\">Hashtable.h diff<\/a>. Overall, I think this actually looks simpler, even if some of the loop patterns are duplicated. Where possible, I extracted helper functions to avoid literal copy\/paste code. Also note that we do not need to keep track of any additional bookkeeping information about occupied buckets, so now the Entry is just a <code>typedef<\/code>:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\ntypedef std::pair&lt;TKey, TValue&gt; Entry;\r\n<\/pre>\n<p>Given the structure of this new code, we can even explore a new collision scheme. Recall that we are using linear probing, where we try successive buckets B, B + 1, B + 2, B + 3, etc. In practice, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Quadratic_probing\">quadratic probing<\/a> can be much faster than linear probing (with a few caveats about load factor being 0.5 or less). An example quadratic probing scheme would use buckets B, B + 1, B + 4, B + 9, etc. Using the fact that each perfect square is the sum of consecutive odd numbers (1<sup>1<\/sup> = 1, 2<sup>2<\/sup> = 1 + 3, 3<sup>2<\/sup> = 1 + 3 + 5, and so on), we can implement it as follows:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nvoid next_index(int&amp; index, int c) const\r\n{\r\n    index += 1 + (2 * c);\r\n    int n = static_cast&lt;int&gt;(buckets_.size());\r\n    if (index &gt;= n)\r\n    {\r\n        index = index % n;\r\n    }\r\n}\r\n<\/pre>\n<p>In this case, <code>c<\/code> represents the (zero-based) collision count: <a href=\"https:\/\/github.com\/brian-dot-net\/games\/commit\/07be9ce3664372828e543df2ae69b3280860e647\">commit<\/a>.<\/p>\n<p>So now we have implementation A (original), implementation B (use separate buckets and entries vectors), and implementation C (B with quadratic probing). Did any of this make a difference? Let&#8217;s run the benchmark again:<\/p>\n<table>\n<tr>\n<th>Operation<\/th>\n<th>n<\/th>\n<th>f<\/th>\n<th>Time A<\/th>\n<th>Time B<\/th>\n<th>Time C<\/th>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>10<\/td>\n<td>0.25<\/td>\n<td>0.137<\/td>\n<td>0.134<\/td>\n<td>0.143<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>10<\/td>\n<td>0.25<\/td>\n<td>0.132<\/td>\n<td>0.13<\/td>\n<td>0.156<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>10<\/td>\n<td>0.25<\/td>\n<td>0.869<\/td>\n<td>1.29<\/td>\n<td>1.315<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>100<\/td>\n<td>0.25<\/td>\n<td>1.305<\/td>\n<td>1.315<\/td>\n<td>1.313<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>100<\/td>\n<td>0.25<\/td>\n<td>1.29<\/td>\n<td>1.267<\/td>\n<td>1.248<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>100<\/td>\n<td>0.25<\/td>\n<td>16.809<\/td>\n<td>9.726<\/td>\n<td>9.677<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>1000<\/td>\n<td>0.25<\/td>\n<td>13.147<\/td>\n<td>12.983<\/td>\n<td>13.071<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>1000<\/td>\n<td>0.25<\/td>\n<td>12.858<\/td>\n<td>12.699<\/td>\n<td>12.39<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>1000<\/td>\n<td>0.25<\/td>\n<td>155.848<\/td>\n<td>126.977<\/td>\n<td>127.08<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>10000<\/td>\n<td>0.25<\/td>\n<td>202.026<\/td>\n<td>129.621<\/td>\n<td>131.877<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>10000<\/td>\n<td>0.25<\/td>\n<td>128.109<\/td>\n<td>128.009<\/td>\n<td>124.676<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>10000<\/td>\n<td>0.25<\/td>\n<td>4560.988<\/td>\n<td>1433.636<\/td>\n<td>1519.233<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>100000<\/td>\n<td>0.25<\/td>\n<td>1326.763<\/td>\n<td>1299.435<\/td>\n<td>1314.786<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>100000<\/td>\n<td>0.25<\/td>\n<td>1300.205<\/td>\n<td>1273.827<\/td>\n<td>1248.835<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>100000<\/td>\n<td>0.25<\/td>\n<td>30199.97<\/td>\n<td>15307.192<\/td>\n<td>15920.22<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>10<\/td>\n<td>0.5<\/td>\n<td>0.137<\/td>\n<td>0.134<\/td>\n<td>0.137<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>10<\/td>\n<td>0.5<\/td>\n<td>0.132<\/td>\n<td>0.131<\/td>\n<td>0.13<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>10<\/td>\n<td>0.5<\/td>\n<td>1.211<\/td>\n<td>1.469<\/td>\n<td>1.469<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>100<\/td>\n<td>0.5<\/td>\n<td>1.302<\/td>\n<td>1.301<\/td>\n<td>1.306<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>100<\/td>\n<td>0.5<\/td>\n<td>1.286<\/td>\n<td>1.267<\/td>\n<td>1.252<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>100<\/td>\n<td>0.5<\/td>\n<td>10.765<\/td>\n<td>10.147<\/td>\n<td>10.127<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>1000<\/td>\n<td>0.5<\/td>\n<td>13.053<\/td>\n<td>13.091<\/td>\n<td>13.171<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>1000<\/td>\n<td>0.5<\/td>\n<td>13.114<\/td>\n<td>12.752<\/td>\n<td>12.976<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>1000<\/td>\n<td>0.5<\/td>\n<td>94.139<\/td>\n<td>86.497<\/td>\n<td>87.787<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>10000<\/td>\n<td>0.5<\/td>\n<td>131.463<\/td>\n<td>131.698<\/td>\n<td>131.291<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>10000<\/td>\n<td>0.5<\/td>\n<td>127.911<\/td>\n<td>127.799<\/td>\n<td>124.366<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>10000<\/td>\n<td>0.5<\/td>\n<td>2145.042<\/td>\n<td>1308.991<\/td>\n<td>1290.921<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>100000<\/td>\n<td>0.5<\/td>\n<td>1295.371<\/td>\n<td>1297.845<\/td>\n<td>1301.38<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>100000<\/td>\n<td>0.5<\/td>\n<td>1259.002<\/td>\n<td>1268.924<\/td>\n<td>1249.057<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>100000<\/td>\n<td>0.5<\/td>\n<td>26632.592<\/td>\n<td>17775.16<\/td>\n<td>17773.18<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>10<\/td>\n<td>0.75<\/td>\n<td>0.137<\/td>\n<td>0.143<\/td>\n<td>0.137<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>10<\/td>\n<td>0.75<\/td>\n<td>0.132<\/td>\n<td>0.13<\/td>\n<td>0.129<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>10<\/td>\n<td>0.75<\/td>\n<td>1.116<\/td>\n<td>1.554<\/td>\n<td>1.572<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>100<\/td>\n<td>0.75<\/td>\n<td>1.292<\/td>\n<td>1.306<\/td>\n<td>1.316<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>100<\/td>\n<td>0.75<\/td>\n<td>1.285<\/td>\n<td>1.271<\/td>\n<td>1.256<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>100<\/td>\n<td>0.75<\/td>\n<td>12.537<\/td>\n<td>12.031<\/td>\n<td>11.966<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>1000<\/td>\n<td>0.75<\/td>\n<td>13.19<\/td>\n<td>12.976<\/td>\n<td>13.125<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>1000<\/td>\n<td>0.75<\/td>\n<td>14.007<\/td>\n<td>12.686<\/td>\n<td>12.453<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>1000<\/td>\n<td>0.75<\/td>\n<td>138.326<\/td>\n<td>118.606<\/td>\n<td>115.429<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>10000<\/td>\n<td>0.75<\/td>\n<td>130.416<\/td>\n<td>129.792<\/td>\n<td>131.337<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>10000<\/td>\n<td>0.75<\/td>\n<td>127.876<\/td>\n<td>127.651<\/td>\n<td>125.036<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>10000<\/td>\n<td>0.75<\/td>\n<td>1388.83<\/td>\n<td>1365.991<\/td>\n<td>1241.811<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>100000<\/td>\n<td>0.75<\/td>\n<td>1294.254<\/td>\n<td>1297.491<\/td>\n<td>1311.522<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>100000<\/td>\n<td>0.75<\/td>\n<td>1271.941<\/td>\n<td>1417.87<\/td>\n<td>1324.824<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>100000<\/td>\n<td>0.75<\/td>\n<td>28530.218<\/td>\n<td>21140.233<\/td>\n<td>20065.233<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>10<\/td>\n<td>1<\/td>\n<td>0.137<\/td>\n<td>0.134<\/td>\n<td>0.137<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>10<\/td>\n<td>1<\/td>\n<td>0.131<\/td>\n<td>0.131<\/td>\n<td>0.129<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>10<\/td>\n<td>1<\/td>\n<td>0.988<\/td>\n<td>1.618<\/td>\n<td>1.642<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>100<\/td>\n<td>1<\/td>\n<td>1.298<\/td>\n<td>1.301<\/td>\n<td>1.315<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>100<\/td>\n<td>1<\/td>\n<td>1.289<\/td>\n<td>1.26<\/td>\n<td>1.253<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>100<\/td>\n<td>1<\/td>\n<td>10.695<\/td>\n<td>12.925<\/td>\n<td>12.879<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>1000<\/td>\n<td>1<\/td>\n<td>13.024<\/td>\n<td>13.055<\/td>\n<td>15.003<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>1000<\/td>\n<td>1<\/td>\n<td>12.719<\/td>\n<td>18.594<\/td>\n<td>16.313<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>1000<\/td>\n<td>1<\/td>\n<td>267.827<\/td>\n<td>252.815<\/td>\n<td>162.483<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>10000<\/td>\n<td>1<\/td>\n<td>129.751<\/td>\n<td>129.187<\/td>\n<td>131.334<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>10000<\/td>\n<td>1<\/td>\n<td>127.38<\/td>\n<td>127.444<\/td>\n<td>124.795<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>10000<\/td>\n<td>1<\/td>\n<td>5708.22<\/td>\n<td>5482.168<\/td>\n<td>2368.82<\/td>\n<\/tr>\n<tr>\n<td>FindMissing<\/td>\n<td>100000<\/td>\n<td>1<\/td>\n<td>1299.615<\/td>\n<td>1293.964<\/td>\n<td>1314.281<\/td>\n<\/tr>\n<tr>\n<td>FindPresent<\/td>\n<td>100000<\/td>\n<td>1<\/td>\n<td>1265.741<\/td>\n<td>1259.396<\/td>\n<td>1252.715<\/td>\n<\/tr>\n<tr>\n<td>Insert<\/td>\n<td>100000<\/td>\n<td>1<\/td>\n<td>94450.147<\/td>\n<td>109098.851<\/td>\n<td>31993.897<\/td>\n<\/tr>\n<\/table>\n<p>Paying attention the 0.5 load factor numbers (since this is what we chose for the trie&#8217;s use case), we are clearly doing better with B. C is roughly the same, though drastically faster in some situations as the load factor increases. For now we&#8217;ll declare C the winner and use it as our solver implementation.<\/p>\n<p>The final contest will be a head-to-head tourney of the managed and native solver implementations, to see if our theoretical improvements translate to actual wall clock time savings:<\/p>\n<table>\n<tr>\n<th>Puzzle<\/th>\n<th>.NET Core Solver<\/th>\n<th>C++ Solver<\/th>\n<\/tr>\n<tr>\n<th>RME<br \/>WCL<\/br>TGK<br \/>API<\/th>\n<td>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;000.000] Loading trie...\r\n&#x5B;000.059] Loaded 162309 words.\r\n&#x5B;000.060] Finding valid words...\r\n&#x5B;000.072] Found 754 valid words.\r\n&#x5B;000.072] Finding solutions...\r\nMARKETPLACE-EARWIG\r\nWIGWAM-MARKETPLACE\r\nPRAGMATIC-CAKEWALK\r\n&#x5B;000.080] Done.\r\n<\/pre>\n<\/td>\n<td>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;000.000] Loading trie...\r\n&#x5B;000.049] Loaded 162309 words.\r\n&#x5B;000.049] Finding valid words...\r\n&#x5B;000.052] Found 754 valid words.\r\n&#x5B;000.052] Finding solutions...\r\nMARKETPLACE-EARWIG\r\nWIGWAM-MARKETPLACE\r\nPRAGMATIC-CAKEWALK\r\n&#x5B;000.054] Done.\r\n<\/pre>\n<\/td>\n<\/tr>\n<tr>\n<th>ERG<br \/>BID<br \/>NCF<br \/>TAO<\/th>\n<td>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;000.000] Loading trie...\r\n&#x5B;000.056] Loaded 162309 words.\r\n&#x5B;000.057] Finding valid words...\r\n&#x5B;000.071] Found 1436 valid words.\r\n&#x5B;000.072] Finding solutions...\r\nROBAND-DEFECTING\r\n -- snipped 72 other solutions -- \r\nOBTECTED-DRAFTING\r\n&#x5B;000.120] Done.\r\n<\/pre>\n<\/td>\n<td>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;000.000] Loading trie...\r\n&#x5B;000.049] Loaded 162309 words.\r\n&#x5B;000.050] Finding valid words...\r\n&#x5B;000.053] Found 1436 valid words.\r\n&#x5B;000.053] Finding solutions...\r\nBAITED-DEFORCING\r\n -- snipped 72 others solutions --\r\nOBTECTED-DRAFTING\r\n&#x5B;000.092] Done.\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/table>\n<p>C++ wins! It &#8220;only&#8221; took a custom data structure and several iterations of optimizations, but we got there in the end. Was it worth it? Sure, if all we care about is personal enrichment. Otherwise, you should look at your situation and heed the advice of <a href=\"http:\/\/carlos.bueno.org\/optimization\/\">Carlos Bueno<\/a>:<\/p>\n<blockquote><p>Unless your optimizations are going to stick around long enough to pay for the time you spend making them plus the opportunity cost of not doing something else, it&#8217;s a net loss.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Last week, in my continuing exploration of a C++ Letter Boxed solver, I had to build a faster hash table to improve performance. Today we&#8217;ll make it even faster, hoping to catch up to the performance of my .NET Core&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[105,101,104],"tags":[],"class_list":["post-5650","post","type-post","status-publish","format-standard","hentry","category-games","category-native","category-performance"],"_links":{"self":[{"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5650","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5650"}],"version-history":[{"count":4,"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5650\/revisions"}],"predecessor-version":[{"id":5654,"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5650\/revisions\/5654"}],"wp:attachment":[{"href":"http:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5650"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5650"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5650"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}