<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Renzibei</title>
  
  <subtitle>I head north because I choose to</subtitle>
  <link href="https://renzibei.com/en/atom.xml" rel="self"/>
  
  <link href="https://renzibei.com/en/"/>
  <updated>2026-06-12T19:03:27.219Z</updated>
  <id>https://renzibei.com/en/</id>
  
  <author>
    <name>Renzibei</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Hash Table Benchmark</title>
    <link href="https://renzibei.com/en/hashtable-bench/"/>
    <id>https://renzibei.com/en/hashtable-bench/</id>
    <published>2026-06-13T15:27:44.000Z</published>
    <updated>2026-06-12T19:03:27.219Z</updated>
    
    <content type="html"><![CDATA[<p>This is yet another benchmark to compare different hash tables (hashmaps) with different hash functions in C++, attempting to evaluate theperformance of the lookup, insertion, deletion, iteration, etc. ondifferent datasets as comprehensively as possible.</p><p>We show the performance of hash tables with hash functions fordifferent operations on different types of datasets of different sizes.The reader can refer to these results and choose the hash table and hashfunction that best match the target application.</p><p>The benchmarks were collected in 2022–2023 (machine configurationsare listed below); this writeup was compiled and published in 2026.</p><span id="more"></span><h2 id="before-viewing-the-benchmark-results">Before viewing thebenchmark results</h2><p>Anyone familiar enough with hash tables knows that even a well-knownand widely used hash table can have a data distribution it is not verygood at. In other words, no hash table is the fastest on all datasetsfor all operations.</p><p>The best practice for selecting a hash table is to consider the datacharacteristics, operation mix, requirements, and hash functiontogether.</p><p>This benchmark tries to use a concise and effective method to testthe performance of different operations of the hash tables on some ofthe most common data distributions. But there will always be datadistributions that are very different from the data we use for testing,and different users have different requirements for differentindicators. Therefore, the best test is still one in the realapplication.</p><h2 id="methodology">Methodology</h2><h3 id="test-items">Test Items</h3><p>We measure combinations of different hash tables with different hashfunctions. For each combination, we measured its insert, delete, lookup(including successful and failed lookups), and iteration performanceunder different data. Below is a more detailed table of test items.Please note that in the following we will use "hash table" or "hash map"interchangeably to refer to the same concept.</p><figure class="markdown-table-div"><table><colgroup><col style="width: 3%"><col style="width: 37%"><col style="width: 58%"></colgroup><thead><tr><th>Index</th><th>Test items</th><th>Notes</th></tr></thead><tbody><tr><td>1</td><td>Insert with reserve</td><td>Call map.reserve(n) before insert n elements</td></tr><tr><td>2</td><td>Insert without reserve</td><td>Insert n elements without prior reserve</td></tr><tr><td>3</td><td>Erase and insert</td><td>Repeatedly do one erase after one insert, keep the map sizeconstant</td></tr><tr><td>4</td><td>Look up keys in the map (hit)</td><td>Repeatedly look up the elements that are in the map</td></tr><tr><td>5</td><td>Look up keys that are not in the map (miss)</td><td>Repeatedly look up the elements that are not in the map</td></tr><tr><td>6</td><td>Look up keys with 50% probability in the map</td><td>Repeatedly look up the elements that have a 50% probability in themap</td></tr><tr><td>7</td><td>Look up keys in the map with large max_load_factor (hit)</td><td>Same as Test Item 4 except that the map is set a max_load_factor of0.9 and rehashed before the lookup operations</td></tr><tr><td>8</td><td>Look up keys that are not in the map with large max_load_factor(miss)</td><td>Same as Test Item 5 except that the map is set a max_load_factor of0.9 and rehashed before the lookup operations</td></tr><tr><td>9</td><td>Look up keys with 50% probability in the map with largemax_load_factor</td><td>Same as Test Item 6 except that the map is set a max_load_factor of0.9 and rehashed before the lookup operations</td></tr><tr><td>10</td><td>Iterate the table</td><td>Iterate the whole table several times</td></tr><tr><td>11</td><td>Heap memory size and load factor with default and largemax_load_factor</td><td>Record the heap memory size and load factor when constructing themap in Test Items 4 and 7</td></tr></tbody></table></figure><p>As may be noticed, several test items measure lookup speed with alarger upper limit on the load factor. The load factor measures how fullthe hash table is, and <code>max_load_factor</code> is the STL API forcontrolling its upper bound. This is because each hash table may have adifferent expansion strategy and <code>max_load_factor</code>, so evenwith the same number of elements, different tables may choose differentload factors and occupy different amounts of memory. The load factor andmemory footprint greatly affect lookup performance, so using a hashtable with a smaller <code>max_load_factor</code> may have worse (orbetter) lookup performance. On the other hand, a higher load factor maylead to a higher probability of collision, thus reducing lookupperformance.</p><p>In addition, extreme lookup performance usually requires making thespace used by the hash table as small as possible to reduce the cachemiss rate. When available memory is very limited, a larger load factormay also be preferred. One way to do this is to set a higher<code>max_load_factor</code>, and then rehash (or set a large<code>max_load_factor</code> before the main construction process of thetable).</p><p>For each of the tests above, we tested the throughput and latency(when the platform under test meets the conditions for the latencytest). The throughput results will be more representative, becausemodern software runs on CPUs with pipelined architectures. And almostall operations will have other instructions before and after them, whichcan make full use of the pipeline. However, for some specific uses,latency data is important. The latency measurement results here are forreference only for special needs and have relatively largelimitations.</p><h3 id="dataset">Dataset</h3><p>All the data used in the benchmark are randomly generated; the usercan choose different seeds for the test data. We tested the performanceof each hash table at different sizes from 32 to 10^7.</p><p>The tested keys consist of 64-bit integers of different distributionsand strings of different lengths. The detailed test data is shown in thetable below.</p><div class="markdown-table-div"><table><colgroup><col style="width: 17%"><col style="width: 28%"><col style="width: 33%"><col style="width: 20%"></colgroup><thead><tr><th>Index</th><th>Key Type</th><th>Value Type</th><th>Notes</th></tr></thead><tbody><tr><td>1</td><td>uint64_t with several split bits masked</td><td>uint64_t</td><td>The keys have such characteristics: only bits in some positions maybe 1, and all other bits are 0. For test data of size n, at mostceil[log2(n)] fixed bits may be 1. e.g. If the key type is uint8_t (itis uint64_t in reality) and the test size is 7, the keys will begenerated with the method <code>rng() &amp; 0b10010001</code>. Thedistribution characteristics of such bits can relatively comprehensivelyexamine whether hash tables and hash functions can handle keys that onlyhave effective information in specific bit positions.</td></tr><tr><td>2</td><td>uint64_t, uniformly distributed in [0, UINT64_MAX]</td><td>uint64_t</td><td>The keys follow a uniform distribution in the range [0,UINT64_MAX].</td></tr><tr><td>3</td><td>uint64_t, bits in high position are masked out</td><td>uint64_t</td><td>The bits in the high position are set to 0. For test data of size n,at most ceil[log2(n)] fixed bits may be 1. For example, if the key typeis uint8_t (uint64_t in reality) and the test size is 7, the keys willbe generated with the method <code>rng() &amp; 0b00000111</code></td></tr><tr><td>4</td><td>uint64_t, bits in low position are masked out</td><td>uint64_t</td><td>The bits in the low position are set to 0. For test data of size n,at most ceil[log2(n)] fixed bits may be 1. For example, if the key typeis uint8_t (uint64_t in reality) and the test size is 7, the keys willbe generated with the method <code>rng() &amp; 0b11100000</code></td></tr><tr><td>5</td><td>uint64_t with several bits masked</td><td>56 bytes struct</td><td>The keys are the same as the distribution of the data 1. The payloadis a 56 bytes long struct, which makes the<code>sizeof(std::pair&lt;key, value&gt;)==64</code></td></tr><tr><td>6</td><td>Small string with a max length of 12</td><td>uint64_t</td><td>The key type is a string with a maximum length of 12. Both lengthand characters are randomly generated. The compiler/library may useSmall String Optimization (SSO).</td></tr><tr><td>7</td><td>Small string with a fixed length of 12</td><td>uint64_t</td><td>The key type is a string with a fixed length of 12. The charactersare randomly generated. The compiler/library may use Small StringOptimization (SSO).</td></tr><tr><td>8</td><td>Mid string with a max length of 24</td><td>uint64_t</td><td>The key type is a string with a maximum length of 24. Both lengthand characters are randomly generated.</td></tr><tr><td>9</td><td>Mid string with a fixed length of 24</td><td>uint64_t</td><td>The key type is a string with a fixed length of 24. The charactersare randomly generated.</td></tr><tr><td>10</td><td>Large string with a max length of 64</td><td>uint64_t</td><td>The key type is a string with a maximum length of 64. Both lengthand characters are randomly generated.</td></tr><tr><td>11</td><td>Large string with a fixed length of 64</td><td>uint64_t</td><td>The key type is a string with a fixed length of 64. The charactersare randomly generated.</td></tr></tbody></table></div><p>Different distributions within the range representable by uint64_tare chosen as keys. Uniformly distributed integers in the range ofuint64_t are the easiest to generate with pseudo-random numbers, butthey are rare in real situations.</p><p>If users are concerned with performance using integer keys, westrongly recommend focusing on the results on the first dataset ratherthan the second dataset (i.e. the dataset with a uniform randomdistribution). The data from the first dataset can better examine theability of hash tables and hash functions to deal with more diversepatterns, while the test on uniform random distributions barely verifiesthe ability of hash tables to handle other distributions. Moreover, inreal data distributions, few keys happen to be uniformly randomlydistributed over the [0, 2^63 - 1] range.</p><p>With this in mind, our analysis for integer keys focuses mainly onthe first dataset. To keep the articles shorter and easier to read, theother integer datasets — including the second (uniform random) one andthe high-/low-bit-masked ones — are mostly shown only in the appendix ofeach test, without detailed discussion.</p><p>For the string datasets, different character sets are used. Forfixed-length strings, the pattern is like the first dataset, whereseveral split bits are masked. In other words, only the bits in somepositions may differ among these datasets. This pattern is intended totest the quality of the hash function.</p><p>For the strings with variable length, a subset of the printablecharacters can appear in the string.</p><p>Real data distributions are often biased. If a combination of hashfunction and hash table can only handle one distribution but cannothandle other distributions, this combination is not robust to unknowndistributions. If the distribution of the data is known in advance, theuser can pick the fastest and most stable hash table for that data.</p><h3 id="tested-hash-functions-and-hash-maps">Tested Hash functions andHash maps</h3><p>Below is the list of hash functions we tested.</p><div class="markdown-table-div"><table><colgroup><col style="width: 11%"><col style="width: 3%"><col style="width: 62%"><col style="width: 22%"></colgroup><thead><tr><th>Name</th><th>Type</th><th>Notes</th><th>Link</th></tr></thead><tbody><tr><td>std::hash</td><td>Normal</td><td>Implemented by compiler; identity hash is used for integer type inlibc++ and libstdc++</td><td></td></tr><tr><td>absl::hash</td><td>Normal</td><td>Implemented by Google; Uses 128-bit product of multiplication and anxor-shift.</td><td>https://github.com/abseil/abseil-cpp</td></tr><tr><td>robin_hood::hash</td><td>Normal</td><td>For integer keys, it uses xor-shift, multiplication, xor-shift; Forstring keys it is similar to absl::hash</td><td>https://github.com/martinus/robin-hood-hashing</td></tr><tr><td>xxHash_xxh3</td><td>Bytes</td><td>Designed for string; We use identity hash for integer type to passcompilation; It won't show in the results of integer keys</td><td>https://github.com/Cyan4973/xxHash</td></tr></tbody></table></div><p>Originally we had some seed hash functions in the tests, which arehash functions that take both a key and a seed as arguments. We removedthese hash functions to keep the test subjects simple, and we use theno-seed version of all the hash tables.</p><p>We will not show the results of hash <code>xxHash_xxh3</code> intests on integer keys. For the early versions of<code>absl::Hash</code>, the behavior on the arm64 platform wasdifferent from that on the x86-64 platform, and it was poor for somedatasets. So we once had a <code>uint128_mul::hash</code> to comparewith it, which is similar to the <code>absl::Hash</code> on the x86-64platform. Since the newest version of <code>absl::Hash</code> has fixedthis problem, we deleted the <code>uint128_mul::hash</code>.</p><p>The following table lists the hash tables we tested. Some of thesehash tables rely on a "good" hash function to work properly, which cangenerate hash values that are as uniformly distributed as possible forunbalanced keys. If a hash function that does not have such a property(e.g. identity hash) is used, then the performance of these hash tablesmay drop drastically. These hash tables may assume that the hash valuesof the keys from the dataset are uniformly distributed in the outputrange. This requires hash functions to have properties like uniformityor diffusion.</p><p>The implication here is that a "good" hash function tends to be morecomplex than the simplest hash function (the identity hash), requiringmore instructions to complete the computation. Some hash tables do notrely on a good hash function, perhaps because they do some extra work toimprove the uniformity of the hash values. For such a hash table, thesimpler the hash function, the better, preferably the identity hash. Sowe should always compare the combinations of hash tables and hashfunctions, rather than fixing the hash function to compare the hashtable, or vice versa.</p><p>Here are the hash maps we tested.</p><div class="markdown-table-div"><table><colgroup><col style="width: 13%"><col style="width: 13%"><col style="width: 52%"><col style="width: 19%"></colgroup><thead><tr><th>Name</th><th>Requires a good hash function</th><th>Notes</th><th>Link</th></tr></thead><tbody><tr><td>std::unordered_map</td><td>No*</td><td>Implemented by the STL library; May differ in libc++ andlibstdc++.</td><td></td></tr><tr><td>ska::flat_hash_map</td><td>No</td><td>Very fast and simple; Uses robin hood hash; Memory overhead:alignof(value_type) per element; Requires small load factor</td><td>https://github.com/skarupke/flat_hash_map</td></tr><tr><td>ska::bytell_hash_map</td><td>No</td><td>A little slower than ska::flat_hash_map but one byte per elementmemory overhead</td><td>https://github.com/skarupke/flat_hash_map</td></tr><tr><td>absl::flat_hash_map</td><td>Yes</td><td>Uses SIMD and metadata; Fast when looking up keys that are not inthe map; One byte per element memory overhead</td><td>https://abseil.io/about/design/swisstables</td></tr><tr><td>absl::node_hash_map</td><td>Yes</td><td>Slower than absl::flat_hash_map but does not invalidate the pointerafter rehash</td><td>https://github.com/abseil/abseil-cpp</td></tr><tr><td>tsl::robin_map</td><td>Yes</td><td>A fast hash table using robin hood hash; Memory overhead is no lessthan ska::flat_hash_map</td><td>https://github.com/Tessil/robin-map</td></tr><tr><td>emhash::HashMap7</td><td>Yes</td><td>Fast in lookup hit operations.</td><td>https://github.com/ktprime/emhash</td></tr><tr><td>fph::DynamicFphMap</td><td>No</td><td>A dynamic perfect hash table; Ultra-fast in lookup but slow ininsert; 2~8 bits per element memory overhead</td><td>https://github.com/renzibei/fph-table</td></tr><tr><td>fph::MetaFphMap</td><td>No</td><td>A dynamic perfect hash table using metadata; Better thanfph::DynamicFphMap in the miss lookup case.</td><td>https://github.com/renzibei/fph-table</td></tr><tr><td>robin_hood::unordered_flat_map</td><td>Yes</td><td>A table using robin hood hash;</td><td>https://github.com/martinus/robin-hood-hashing</td></tr><tr><td>ankerl::unordered_dense_map</td><td>Yes</td><td>Stores entries in a dense array; fastest to iterate; compactfootprint</td><td>https://github.com/martinus/unordered_dense</td></tr></tbody></table></div><p>* Note: For the tested libc++ and libstdc++ versions, the libc++implementation requires a good hash function but libstdc++ has no suchrequirement. If using <code>std::hash</code>, the performance can bepoor when the size is the power of 2 for libc++.</p><p>At a quick glance, it is easy to see that many of the hash tableslisted use the <a href="https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf">robin hoodhashing</a> technique in the pursuit of speed.</p><h2 id="experiments-and-results">Experiments and Results</h2><p>The code of this benchmark is available at <a href="https://github.com/renzibei/hashtable-bench">https://github.com/renzibei/hashtable-bench</a>.</p><h3 id="testing-platform">Testing Platform</h3><p>Platform 1: Intel Xeon E-2388G CPU @ 3.20 GHz, boost to 5.1 GHz;x86-64; Rocket Lake.</p><p>Platform 2: M1 Max Macbook Pro 16 inch, 2021; arm64; Firestorm.</p><p>Due to the lack of a high-precision time stamp counter on the arm64(M1 Max) platform, we only measured the latency on the x86-64 platform(even the AMD CPU has some problems when measuring the latency using theTSC, so we only test latency on the Intel CPU). In addition, for thex86-64 platform, we have also taken some measures to ensure thestability of the measurement results, including the following:</p><ol type="1"><li>Use the <code>taskset</code> command to set CPU core affinity</li><li>Turn off hyperthreading</li><li>Isolate the cores by adding <code>isolcpus=</code> and<code>rcu_nocbs=</code> in <code>GRUB_CMDLINE_LINUX</code> in<code>/etc/default/grub</code></li><li>Turn off some power-saving options, including disabling the<code>ondemand</code> systemd service, and setting<code>idle=poll</code> and <code>intel_idle.max_cstate=0</code> in thegrub command line.</li><li>Turn off timer tick interrupts, recompile the kernel with<code>CONFIG_NO_HZ_FULL=y</code> and set <code>nohz_full=</code> in thegrub command line.</li><li>Other adjustments that de-jitter the system latency. You can referto <a href="https://rigtorp.se/low-latency-guide/">https://rigtorp.se/low-latency-guide/</a>.</li></ol><p>These measures cannot be done on macOS. But as we do not measure thelatency of operations on macOS, it doesn't matter that much.</p><h3 id="results">Results</h3><p>For throughput data, performance will be represented by the averagetime per operation. We will plot the average time per operation fordifferent scales of data. The shorter the time, the better theperformance.</p><p>For the latency data, due to the limitation of the article length, weonly show the latency of the 99th percentile in most test cases, whichcan help to show the worst time complexity and long tail latency of thehash table. And that's not even enough to reflect worst-case latency.For a distribution with long-tailed features, the 0.99th, 0.999th, and0.9999th quantiles can all have very different values. If theapplication has strict requirements on real-time performance and taillatency (such as gaming and high-frequency trading), then this datametric should be worth paying attention to.</p><p>If too much time is spent in a test, we will count it as the timeoutand set the time as zero, and that data point won't be plotted.</p><p>You can click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure.</p><p><a id="posts" name="posts"></a> We divide the results into differentgroups according to data type and operation type.</p><ul><li>Integer Key<ul><li><a href="/en/int-insert-construct/" title="Integer Insert and Construct">Integer Insert and Construct</a></li><li><a href="/en/int-erase-insert/" title="Integer Erase and Insert">Integer Erase and Insert</a></li><li><a href="/en/int-lookup-throughput/" title="Integer Lookup Throughput">Integer Lookup Throughput</a></li><li><a href="/en/int-lookup-latency/" title="Integer Lookup Latency">Integer Lookup Latency</a></li><li><a href="/en/int-iterate/" title="Integer Iterate">Integer Iterate</a></li></ul></li><li>String Key<ul><li><a href="/en/string-insert-construct/" title="String Insert and Construct">String Insert and Construct</a></li><li><a href="/en/string-erase-insert/" title="String Erase and Insert">String Erase and Insert</a></li><li><a href="/en/64-byte-string-lookup/" title="64 byte String Lookup">64 byte String Lookup</a></li><li><a href="/en/24-byte-string-lookup/" title="24 byte String Lookup">24 byte String Lookup</a></li><li><a href="/en/12-byte-string-lookup/" title="12 byte String Lookup">12 byte String Lookup</a></li><li><a href="/en/string-iterate/" title="String Iterate">String Iterate</a></li></ul></li><li><a href="/en/memory-usage-and-load-factor/" title="Memory Usage and Load Factor">Memory Usage and Load Factor</a></li><li><a href="/en/analysis-and-conclusion/" title="Analysis & Conclusion">Analysis &amp; Conclusion</a></li></ul><h2 id="conclusion">Conclusion</h2><p>In short, no single hash table is best for every workload — the rightpick depends on which operations dominate, what the keys look like, andhow much memory is available. If lookups dominate and the table is builtonce, the perfect-hash <code>fph::DynamicFphMap</code> (and<code>fph::MetaFphMap</code> when misses are common) is hard to beat, atthe cost of slow construction. For a general-purpose map with a mix ofinserts and lookups, <code>absl::flat_hash_map</code> with<code>absl::Hash</code> is a fast, compact and robust default;<code>ska::flat_hash_map</code> is the quickest while the data stays incache, and <code>ankerl::unordered_dense_map</code> is the one to reachfor when iteration is frequent. <code>std::unordered_map</code> is theslowest in most tests and is worth keeping mainly for itspointer-stability guarantees.</p><p>For the full reasoning behind these picks, with a per-test andper-workload comparison, see the <a href="/en/analysis-and-conclusion/" title="Analysis & Conclusion">Analysis &amp; Conclusion</a>.</p><h2 id="restrictions">Restrictions</h2><h3 id="exclusive-access-to-resources">Exclusive access toresources</h3><p>In our tests, almost all computer resources can be monopolized by thetest program, especially cache resources. And this is relatively rare inpractical applications. In fact, other processes and tasks may occupypart of the cache. In practical applications users should expect a loweravailable cache size.</p><h3 id="cold-memory-and-warm-cache">Cold memory and warm cache</h3><p>We neither did a warmup, nor did we specifically test the cold startscenario. In our tests, we repeatedly test an operation with a range ofdata many times. Therefore, when the number of operations is muchgreater than the amount of data, it can be considered that mostoperations are accessing the warm cache. When the number of operationsis less than the number of data, most operations are accessing coldmemory. In our test, limited by the test time, when the amount of datais small, the number of operations will be much greater than the amountof data; and when the amount of data is large, the number of operationswill be equal to the amount of data.</p><h3 id="huge-test-space">Huge test space</h3><p>The size of the test space contains at least <figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">|hash table set| x |hash function set| x |data sets| x |operation set| x |hardware platform set| x |compiler set|</span><br></pre></td></tr></table></figure> As can beseen, the testable space is quite huge. Any addition to the set of hashtables or the set of hash functions will greatly increase the testingeffort. Due to time and resource constraints, we have only explored partof the combinations, and there are still many combinations and spacesthat we have not tested.</p><p>Therefore, in order to choose the most suitable hash table and hashfunction for the user's purpose, real tests should be carried out in theapplication scenario.</p><h2 id="postscript">Postscript</h2><p>This hash table benchmark series took at least four years from startto finish. The first version of the data was already available in 2022,when the M1 Max was still a fairly new CPU; now even the M5 is out. Ittook this long because organizing so many charts and analyses wasgenuinely tedious. I finished the main body of the posts roughly between2022 and 2023, but left a lot of cleanup work undone out of laziness.Because this kept hanging over me, the blog also fell into a kind ofhead-of-line blocking: I kept thinking, "I still haven't finished thisseries," and ended up not publishing other posts either.</p><p>Over these years, many things have changed. CPUs have gone throughseveral generations, new hash functions and hash tables have appeared,and LLM technology has also advanced rapidly. Many things discussed inthese posts may already be somewhat out of date. All I can say is: timepasses like a river, never stopping day or night.</p><p>Whatever the quality of this benchmark series may be, I have decidedto stop iterating on it and publish it as it is.</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;This is yet another benchmark to compare different hash tables (hash
maps) with different hash functions in C++, attempting to evaluate the
performance of the lookup, insertion, deletion, iteration, etc. on
different datasets as comprehensively as possible.&lt;/p&gt;
&lt;p&gt;We show the performance of hash tables with hash functions for
different operations on different types of datasets of different sizes.
The reader can refer to these results and choose the hash table and hash
function that best match the target application.&lt;/p&gt;
&lt;p&gt;The benchmarks were collected in 2022–2023 (machine configurations
are listed below); this writeup was compiled and published in 2026.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - 24 Byte String Lookup</title>
    <link href="https://renzibei.com/en/24-byte-string-lookup/"/>
    <id>https://renzibei.com/en/24-byte-string-lookup/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.218Z</updated>
    
    <content type="html"><![CDATA[<p>The 24 byte string lookup test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/24-byte-string-lookup.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we measure the lookup performance of hash tables inthree kinds of situations:</p><ol type="1"><li>Look up the keys in the hash table (hit or successful find).</li><li>Look up the keys not in the hash table (miss or unsuccessfulfind).</li><li>Look up keys with a 50% probability of being in the hash table.</li></ol><p>There are two kinds of keys in this test: strings with a fixed lengthof 24 bytes, and strings with a max length of 24 bytes.</p><p>At 24 bytes the keys have crossed the Small String Optimizationboundary: a <code>std::string</code> of this length no longer fitsinline (libstdc++ stores up to 15 bytes inline), so the fixed-24 keysare all heap-allocated and the lookup must dereference a pointer toreach the characters before it can hash or compare them. This adds anear-guaranteed cache miss per key to the hash/compare cost discussed inthe 12-byte post, and it makes the fixed-length and max-length variantsbehave quite differently: in the max-24 case many keys are short enoughto stay inline, avoiding that extra indirection. The four hashes testedare again <code>std::hash</code>, <code>absl::Hash</code>,<code>robin_hood::hash</code>, and <code>xxHash_xxh3</code>; with morebytes to digest per key, the bytes-optimized xxh3 now wins for almostevery table. Each chart shows the best hash per table; Xeon E-2388G andM1 Max throughput are paired and latency is Xeon-only.</p><h2 id="throughput">Throughput</h2><h3 id="lookup-keys-in-the-table-hit">Lookup keys in the table(hit)</h3><h4 id="use-default-max_load_factor">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><p>Two things stand out compared to the 12-byte test. First,<code>xxHash_xxh3</code> is now the per-table winner for essentiallyevery table on the Xeon, because the larger key makes hashing a biggerfraction of the work and xxh3's byte throughput dominates. Second, thewhole field is slower and more tightly bunched: with every fixed-24 keyon the heap, a hit pays both the hashing of 24 bytes and a pointer chaseto the stored characters for the comparison. On the Xeon therobin-hood-style <code>ska::flat_hash_map</code> and<code>tsl::robin_map</code> (xxh3) lead at scale, about 115 ns at 10^7,with the rest of the flat tables within 15-20 ns of them;<code>std::unordered_map</code> is the main outlier at 192 ns. In cache(1,024 elements) <code>fph::DynamicFphMap</code> and<code>ankerl::unordered_dense_map</code> are quickest at roughly 11.5-12ns, the perfect-hash table again benefiting from its single-probeguarantee before memory traffic dominates.</p><p>The M1 Max separates the field more clearly: the perfect-hash tables<code>fph::DynamicFphMap</code> and <code>fph::MetaFphMap</code> (xxh3)lead through the mid-sizes (15.5 and 16.9 ns at 32,768) and stay nearthe front to 10^7 (125.7 and 139.0 ns), helped by the M1's large cacheskeeping their sparser arrays resident longer.<code>std::unordered_map</code> is again last (208 ns at 10^7).</p><p>The max-length variant below is noticeably faster, often nearly halfthe time at small sizes (for instance <code>tsl::robin_map</code> atabout 6.8 ns vs 18.6 ns at 1,024 on the Xeon), precisely because mostmax-24 keys are short enough to stay inline and skip the heapdereference. The large-<code>max_load_factor</code> charts keep the sameranking with slightly denser packing.</p><h5 id="kv-string-with-a-max-length-of-24-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-24-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h3 id="lookup-keys-not-in-the-table-miss">Lookup keys not in the table(miss)</h3><h4 id="use-default-max_load_factor-1">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><p>As at 12 bytes, misses are cheaper and <code>fph::MetaFphMap</code>leads because its metadata rejects an absent key without dereferencingthe heap pointer or comparing any bytes. On the Xeon it answers afixed-24 miss in about 6 ns at 1,024 and 10.6 ns at 32,768, ahead of theSwissTable tables (<code>absl::flat_hash_map</code> 11.6 ns,<code>absl::node_hash_map</code> 12.0 ns), and it stays fastest to 10^7(69.8 ns). The robin-hood tables again lag in the mid-range(<code>tsl::robin_map</code> 24.6 ns, <code>ska::flat_hash_map</code>25.1 ns at 32,768) because of their longer probe runs. The metadataadvantage is even larger on the M1 Max, where<code>fph::MetaFphMap</code> resolves a miss in 4.7-9.5 ns up through200,000 elements and only 40.5 ns at 10^7, well clear of the field.</p><p>The max-length variant follows the same ordering but with smallerabsolute numbers since many keys stay inline; on the M1 Max<code>fph::MetaFphMap</code> dips to just 9-11 ns through 1.2M elements.The large-<code>max_load_factor</code> charts are again consistent withthis pattern.</p><h5 id="kv-string-with-a-max-length-of-24-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor-1">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-3">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-24-uint64_t-3">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h3 id="lookup-keys-with-a-50-probability-in-the-table">Lookup keys witha 50% probability in the table</h3><h4 id="use-default-max_load_factor-2">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-4">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><p>The mixed workload lands between the two: the SwissTable tables<code>absl::flat_hash_map</code> and<code>r_h::unordered_flat_map</code> (xxh3) and<code>fph::MetaFphMap</code> share the lead, around 8.6-10.2 ns at 1,024and roughly 105-117 ns at 10^7 on the Xeon, with<code>std::unordered_map</code> trailing at 187 ns. Because half thequeries are misses that never touch the heap-stored bytes, themetadata-friendly tables do better here than in the pure-hit case. TheM1 Max keeps the same set of flat tables in front with its usual flattercurves. The max-length variant and thelarge-<code>max_load_factor</code> appendix charts follow the samepattern.</p><h5 id="kv-string-with-a-max-length-of-24-uint64_t-4">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor-2">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-5">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-24-uint64_t-5">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h2 id="latency">Latency</h2><p>The P99 latency charts (Xeon only) capture the worst 1% of lookups.The 24-byte heap layout makes these tails heavier than at 12 bytes,because a slow lookup can miss the cache on the slot array, on theheap-stored key bytes, and on the page table all at once.</p><h3 id="lookup-keys-in-the-table-hit-1">Lookup keys in the table(hit)</h3><h4 id="use-default-max_load_factor-3">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-6">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><p>For fixed-24 hits the tails climb steeply and converge: by 10^7 everyflat table sits in a 750-830 ns band, with<code>r_h::unordered_flat_map</code> (xxh3) best at 750 ns and<code>std::unordered_map</code> worst at 930 ns. The jump from thein-cache regime is large, the tail rising from about 50 ns at 1,024 toseveral hundred nanoseconds already at 32,768, since the worst-caselookup now reliably misses on the heap-stored key. The max-lengthvariant has visibly lower tails (e.g. roughly 535-695 ns at 10^7)because the inline short keys spare those lookups the extra heapmiss.</p><h5 id="kv-string-with-a-max-length-of-24-uint64_t-6">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-3">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-7">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-24-uint64_t-7">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h3 id="lookup-keys-not-in-the-table-miss-1">Lookup keys not in thetable (miss)</h3><h4 id="use-default-max_load_factor-4">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-8">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><p>On the miss path the metadata tables keep their tails lowest incache: <code>fph::MetaFphMap</code> and<code>ankerl::unordered_dense_map</code> hold around 22-51 ns up to32,768, while the robin-hood tables already spike past 145 ns there. By10^7 the tails again merge into the 600-700 ns range. The max-24 variantis the more telling one: there the tail stays low far longer (theleaders are near 130 ns even at 200,000 elements) before climbing,because most missing keys are rejected from inline data without a heaptouch. This shows how the SSO boundary, not just the table algorithm,shapes the latency tail.</p><h5 id="kv-string-with-a-max-length-of-24-uint64_t-8">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-4">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-9">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-24-uint64_t-9">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h3 id="lookup-keys-with-a-50-probability-in-the-table-1">Lookup keyswith a 50% probability in the table</h3><h4 id="use-default-max_load_factor-5">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-10">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><p>With half the queries hitting, the tail is set by the heavier hitpath and the fixed-24 curves converge into the 740-980 ns band at 10^7,<code>r_h::unordered_flat_map</code> best at 745 ns and<code>std::unordered_map</code> worst at 980 ns. The metadata advantagethat <code>fph::MetaFphMap</code> enjoys on pure misses is diluted herebecause the hit half still pays the heap dereference and byte compare.As before the max-length variant and thelarge-<code>max_load_factor</code> appendix charts follow the samepattern.</p><h5 id="kv-string-with-a-max-length-of-24-uint64_t-10">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-5">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-24-uint64_t-11">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-24-uint64_t-11">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/24-byte-string-lookup.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The 24 byte string lookup test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - 64 Byte String Lookup</title>
    <link href="https://renzibei.com/en/64-byte-string-lookup/"/>
    <id>https://renzibei.com/en/64-byte-string-lookup/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.219Z</updated>
    
    <content type="html"><![CDATA[<p>The 64 byte string lookup test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/64-byte-string-lookup.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we measure the lookup performance of hash tables inthree kinds of situations:</p><ol type="1"><li>Look up the keys in the hash table (hit or successful find).</li><li>Look up the keys not in the hash table (miss or unsuccessfulfind).</li><li>Look up keys with a 50% probability of being in the hash table.</li></ol><p>There are two kinds of keys in this test: strings with a fixed lengthof 64 bytes, and strings with a max length of 64 bytes.</p><p>At 64 bytes every fixed-length key is well past the Small StringOptimization limit, so all of them live on the heap and a lookup mustfollow a pointer to reach the characters before hashing or comparingthem. The key is now four cache lines of data, which makes the hashfunction cost the dominant term in the lookup: feeding 64 bytes throughthe hash takes far longer than the slot arithmetic. As a result<code>xxHash_xxh3</code>, which is tuned for byte throughput, is theper-table winner for essentially every table on both machines in thistest, and the differences between table layouts shrink because they allpay the same large hashing and pointer-chasing cost. The fixed-64 andmax-64 variants again differ in that the max-length keys include manyshort, SSO-eligible strings that skip the heap indirection. Charts pairthe Xeon E-2388G with the M1 Max for throughput; latency isXeon-only.</p><h2 id="throughput">Throughput</h2><h3 id="lookup-keys-in-the-table-hit">Lookup keys in the table(hit)</h3><h4 id="use-default-max_load_factor">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><p>With hashing 64 bytes dominating, the field is closer together thanin the shorter-key tests, but the ordering is still informative. On theXeon the perfect-hash <code>fph::DynamicFphMap</code> and<code>fph::MetaFphMap</code> (xxh3) are fastest in cache (about 16.4 and16.8 ns at 1,024, 54.6 and 56.5 ns at 32,768) thanks to theirsingle-probe guarantee, which still matters even when the hash isexpensive. At 10^7 they remain near the front (152.7 and 163.5 ns) butthe short-probe robin-hood tables <code>tsl::robin_map</code> and<code>ska::flat_hash_map</code> catch up (151.7 and 152.2 ns). Everytable uses xxh3 as its best hash here. The node-based<code>std::unordered_map</code> is clearly behind at 246.7 ns, paying aheap node dereference on top of the already-heavy key dereference.</p><p>The M1 Max shows the perfect-hash tables leading more decisively,with <code>fph::DynamicFphMap</code> fastest across the whole range(15.5 ns at 32,768, 112.7 ns at 10^7), helped by the large M1 cacheskeeping its sparser array resident. <code>std::unordered_map</code> isagain last at 184.5 ns.</p><p>The max-length variant below is markedly faster, for instance<code>tsl::robin_map</code> runs a hit in about 9.9 ns at 1,024 versus18.4 ns for fixed-64, because the short SSO-eligible keys avoid both theheap dereference and the cost of hashing a full 64 bytes. Thelarge-<code>max_load_factor</code> charts keep the same ranking withdenser packing.</p><h5 id="kv-string-with-a-max-length-of-64-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-64-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h3 id="lookup-keys-not-in-the-table-miss">Lookup keys not in the table(miss)</h3><h4 id="use-default-max_load_factor-1">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><p>Misses are again the case where <code>fph::MetaFphMap</code> standsout: its metadata lets it reject an absent key without dereferencing theheap-stored characters or comparing any bytes, so on the Xeon it leadsat 8.2 ns at 1,024, 16.9 ns at 32,768, and stays fastest to 10^7 (95.5ns). The SwissTable tables <code>absl::flat_hash_map</code> and<code>ankerl::unordered_dense_map</code> follow closely, while therobin-hood tables <code>tsl::robin_map</code> and<code>ska::flat_hash_map</code> fall behind in the mid-range (41.6 and43.1 ns at 32,768) due to longer probe runs. Avoiding the full 64-bytecompare matters a lot here, so the metadata and SwissTable schemes thatshort-circuit on a tag byte pull noticeably ahead. The M1 Max amplifiesthe metadata advantage: <code>fph::MetaFphMap</code> answers a miss in5.9-18 ns up to 200,000 elements and just 54.2 ns at 10^7.<code>std::unordered_map</code> is the slowest at scale on both machines(231 ns Xeon, 125 ns M1 at 10^7).</p><p>The max-length variant follows the same ranking with smaller numbers,and the large-<code>max_load_factor</code> charts are consistent withthis pattern.</p><h5 id="kv-string-with-a-max-length-of-64-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor-1">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-3">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-64-uint64_t-3">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h3 id="lookup-keys-with-a-50-probability-in-the-table">Lookup keys witha 50% probability in the table</h3><h4 id="use-default-max_load_factor-2">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-4">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><p>The mixed workload sits between the two cases. On the Xeon<code>absl::flat_hash_map</code>, <code>r_h::unordered_flat_map</code>and <code>fph::MetaFphMap</code> (all xxh3) share the lead, around11.7-12 ns at 1,024 and 137-148 ns at 10^7, with<code>std::unordered_map</code> trailing at 235 ns. On the M1 Max<code>fph::MetaFphMap</code> is fastest across most sizes (22.8 ns at32,768, 104.2 ns at 10^7), since the miss half of the workload rewardsits metadata while the hit half still benefits from its single-probelayout. The max-length variant and thelarge-<code>max_load_factor</code> appendix charts follow the samepattern.</p><h5 id="kv-string-with-a-max-length-of-64-uint64_t-4">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor-2">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-5">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-64-uint64_t-5">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h2 id="latency">Latency</h2><p>The P99 latency charts (Xeon only) show the tail. With 64-byteheap-allocated keys, a worst-case lookup can miss on the slot array, onthe key's heap buffer, and on the page table, so the tails are theheaviest of the three string tests.</p><h3 id="lookup-keys-in-the-table-hit-1">Lookup keys in the table(hit)</h3><h4 id="use-default-max_load_factor-3">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-6">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><p>For fixed-64 hits the tails converge sharply: from about 67-92 ns at1,024 they jump to the 550-620 ns range already at 32,768 and then toroughly 830-955 ns at 10^7 for the plotted tables, where<code>r_h::unordered_flat_map</code> (xxh3) is at the front at 830 nsand <code>absl::node_hash_map</code> is the slowest of them at 955 ns.<code>std::unordered_map</code> is slower still, but its tail runs pastthe chart's 1,000 ns display limit (about 1,025 ns at 10^7), so it isnot drawn here. The steep early rise reflects the guaranteed heap misson the key bytes once the buffers no longer fit in cache. The max-lengthvariant has lower tails (the leading tables run roughly 290-330 ns at32,768 versus 550+ for fixed) because the inline short keys spare manylookups that miss.</p><h5 id="kv-string-with-a-max-length-of-64-uint64_t-6">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-3">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-7">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-64-uint64_t-7">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h3 id="lookup-keys-not-in-the-table-miss-1">Lookup keys not in thetable (miss)</h3><h4 id="use-default-max_load_factor-4">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-8">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><p>On the miss path <code>fph::MetaFphMap</code> keeps the lowest tailin cache (24.8 ns at 1,024, 105.6 ns at 32,768) because its metadatasettles a miss without touching the heap-stored key bytes, and<code>ankerl::unordered_dense_map</code> is the next best. Therobin-hood tables blow up earliest, past 400 ns at 32,768. The max-64variant pushes the tail-blowup point out considerably, the leadersstaying near 52-62 ns at 32,768, because most missing keys are shortenough to be rejected from inline data. As at 24 bytes, this shows thatthe SSO boundary shapes the latency tail as much as the table algorithmdoes.</p><h5 id="kv-string-with-a-max-length-of-64-uint64_t-8">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-4">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-9">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-64-uint64_t-9">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h3 id="lookup-keys-with-a-50-probability-in-the-table-1">Lookup keyswith a 50% probability in the table</h3><h4 id="use-default-max_load_factor-5">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-10">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><p>With half the queries hitting, the tail is governed by the heavierhit path and most fixed-64 curves bunch into the 805-890 ns band at 10^7(many tables cluster near 805-845 ns, with <code>fph::MetaFphMap</code>the highest among them at about 890 ns); <code>std::unordered_map</code>again runs past the chart's 1,000 ns display limit (about 1,065 ns) andis not drawn. The metadata advantage that <code>fph::MetaFphMap</code>enjoys on pure misses is diluted because the hit half still requires theheap dereference and 64-byte compare. The max-length variant and thelarge-<code>max_load_factor</code> appendix charts follow the samepattern.</p><h5 id="kv-string-with-a-max-length-of-64-uint64_t-10">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-5">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-64-uint64_t-11">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-64-uint64_t-11">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/64-byte-string-lookup.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The 64 byte string lookup test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - 12 Byte String Lookup</title>
    <link href="https://renzibei.com/en/12-byte-string-lookup/"/>
    <id>https://renzibei.com/en/12-byte-string-lookup/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.218Z</updated>
    
    <content type="html"><![CDATA[<p>The 12 byte string lookup test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/12-byte-string-lookup.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we measure the lookup performance of hash tables inthree kinds of situations:</p><ol type="1"><li>Look up the keys in the hash table (hit or successful find).</li><li>Look up the keys not in the hash table (miss or unsuccessfulfind).</li><li>Look up keys with a 50% probability of being in the hash table.</li></ol><p>There are two kinds of keys in this test: strings with a fixed lengthof 12 bytes, and strings with a max length of 12 bytes.</p><p>Unlike the integer tests, string-key lookup spends a large share ofits time inside the hash function and the key comparison: the whole bytesequence has to be fed through the hash, and on a hit the candidateslot's bytes have to be compared against the query. This makes thechoice of hash function much more visible than for integers. The fourhashes tested here are <code>std::hash</code>, <code>absl::Hash</code>,<code>robin_hood::hash</code>, and <code>xxHash_xxh3</code>; xxh3 ispurpose-built for hashing byte ranges, so it tends to win for the tablewhose lookup loop is hash-bound. A second string-specific effect isSmall String Optimization (SSO): a 12-byte <code>std::string</code> isstored inline in the string object, so there is no separate heapallocation and the characters are reached without a pointer dereference.The fixed-length variant makes every key exactly 12 bytes, while themax-length variant lets keys vary up to 12 bytes, which also exercisesthe length-dependent branches of the hash and comparison code. Below,each chart shows the fastest hash per table (click the legend to revealthe rest); the Xeon E-2388G and M1 Max throughput charts are paired, andlatency is measured on the Xeon only.</p><h2 id="throughput">Throughput</h2><h3 id="lookup-keys-in-the-table-hit">Lookup keys in the table(hit)</h3><h4 id="use-default-max_load_factor">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><p>On a successful lookup the table has to compute the hash, find theslot, and then actually compare the 12 query bytes against the storedkey, so every table pays the full hash-plus-compare cost. On the XeonE-2388G the perfect-hash <code>fph::DynamicFphMap</code> is the fastestwhile the working set stays in cache: with <code>xxHash_xxh3</code> itdoes a hit in about 6.5 ns at 1,024 elements and 8.4 ns at 32,768, aheadof every conventional table. This is the regime where fph shines,because its minimal perfect hash guarantees the key lands in its slot onthe first probe, so the only memory touch is the single slot it reads.As the table grows past the L3 cache the picture inverts: once thelookup becomes memory-bound, the tables that keep their probe sequenceshort and local win instead. At 10^7 elements<code>tsl::robin_map</code> and <code>ska::flat_hash_map</code> (bothwith xxh3) lead at about 42 ns, while <code>fph::DynamicFphMap</code>slows to 57 ns and the metadata-heavier <code>fph::MetaFphMap</code> to64 ns, because the perfect-hash tables spread their entries over asparser array and miss the cache more often at scale.</p><p>The hash choice is decisive here. <code>xxHash_xxh3</code> is theper-table winner for most of the open-addressing and perfect-hashtables, since their lookup loop is dominated by hashing 12 bytes; theSwissTable-style <code>emhash::hash_map7</code> and<code>ska::bytell_hash_map</code> instead pair best with<code>absl::Hash</code>. The node-based tables trail badly once theyleave cache: <code>absl::node_hash_map</code> reaches 107 ns and<code>std::unordered_map</code> 118 ns at 10^7, each lookup chasing aheap node pointer on top of the hash work.</p><p>The M1 Max tells a similar but flatter story. With its larger cachesthe SwissTable-family tables lead at scale(<code>emhash::hash_map7</code> and <code>ska::bytell_hash_map</code>around 60-66 ns at 10^7), and <code>absl::Hash</code> is the winninghash for several tables there rather than xxh3, reflecting the Arm-tunedimplementation. The node-based tables again finish last(<code>std::unordered_map</code> at about 200 ns at 10^7).</p><p>The max-length variant below behaves almost identically: with allkeys still fitting SSO and capped at 12 bytes, length variation barelychanges the hash/compare cost, so the rankings and magnitudes match thefixed-length case. The large-<code>max_load_factor</code> charts packthe entries more densely, which slightly helps cache residency at largesizes but lengthens probe sequences, leaving the overall orderingunchanged.</p><h5 id="kv-string-with-a-max-length-of-12-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-12-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h3 id="lookup-keys-not-in-the-table-miss">Lookup keys not in the table(miss)</h3><h4 id="use-default-max_load_factor-1">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><p>A miss is cheaper than a hit because most tables can reject the queryfrom metadata alone, without ever comparing the 12 bytes. This is where<code>fph::MetaFphMap</code> is the clear winner: its per-slot metadatalets it answer "not present" after a single metadata check, so on theXeon it returns a miss in about 3.6 ns at 1,024 elements and stays under10 ns all the way to 1.2M elements (9.65 ns), far ahead of the field.Even at 10^7, where it becomes DRAM-bound and rises to 30.7 ns, it iscompetitive with the best SwissTable tables(<code>absl::flat_hash_map</code> at 25.1 ns,<code>r_h::unordered_flat_map</code> at 29.0 ns). The robin-hood-styletables <code>tsl::robin_map</code> and <code>ska::flat_hash_map</code>are much slower in the cache regime (16-26 ns at 32,768-200,000) becausetheir backward-shift probing must walk a run of slots before concludingthe key is absent.</p><p>Because a miss avoids the full byte comparison, the hash functionmatters slightly less here, and the per-table winner is more often<code>absl::Hash</code> than xxh3. The node-based<code>std::unordered_map</code> remains the slowest at scale (110 ns at10^7), since even a miss requires hashing and then traversing a bucket'snode chain. On the M1 Max the same ordering holds with smaller absolutenumbers: <code>fph::MetaFphMap</code> answers misses in 4-6 ns through1.2M elements and only 18.6 ns at 10^7, again the tightest curve on theplatform.</p><p>The max-length variant and the large-<code>max_load_factor</code>charts repeat this pattern; the only visible change is that the denserlarge-load-factor tables lengthen the robin-hood probe runs slightly,widening their gap behind the metadata-based and SwissTable tables.</p><h5 id="kv-string-with-a-max-length-of-12-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor-1">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-3">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-12-uint64_t-3">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h3 id="lookup-keys-with-a-50-probability-in-the-table">Lookup keys witha 50% probability in the table</h3><h4 id="use-default-max_load_factor-2">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-4">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><p>The 50%-hit workload mixes hits and misses, so it sits between thetwo cases and the ranking changes accordingly. On the Xeon theSwissTable-style <code>absl::flat_hash_map</code> and<code>r_h::unordered_flat_map</code> (both with xxh3) now top thesmall-to-mid range at about 6 ns at 1,024 and 13-14 ns at 32,768, with<code>fph::MetaFphMap</code> right alongside them; the perfect-hashtables no longer dominate the cache regime as cleanly as in the pure-hitcase because half the queries are misses that the metadata tablesresolve very cheaply. At 10^7 the robin-hood tables again pull ahead onmemory locality (<code>tsl::robin_map</code> 43.5 ns,<code>ska::flat_hash_map</code> 44.4 ns), while<code>std::unordered_map</code> trails at 119.5 ns.</p><p>The M1 Max ranks <code>r_h::unordered_flat_map</code> and<code>absl::flat_hash_map</code> first across most sizes, with the usualflatter curves; the per-table best hash is a mix of xxh3 and<code>absl::Hash</code>. As before, the max-length variant and thelarge-<code>max_load_factor</code> charts follow the same pattern withno qualitative change.</p><h5 id="kv-string-with-a-max-length-of-12-uint64_t-4">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="use-large-max_load_factor-2">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-5">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-12-uint64_t-5">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h2 id="latency">Latency</h2><p>The P99 latency charts (Xeon only) show the tail of the lookup-timedistribution: the 99th-percentile single lookup, which is dominated bythe worst cache and TLB misses on the probe path. As the working setoutgrows the L3 cache, every table's tail jumps to several hundrednanoseconds because the slowest 1% of lookups now take a full DRAM roundtrip.</p><h3 id="lookup-keys-in-the-table-hit-1">Lookup keys in the table(hit)</h3><h4 id="use-default-max_load_factor-3">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-6">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><p>For hits, the tail is set by how many cache lines the probe sequencetouches in its worst case. In cache, <code>fph::DynamicFphMap</code> and<code>fph::MetaFphMap</code> have the lowest tails (about 21 ns at 1,024and 36-37 ns at 32,768) because their single-probe guarantee bounds theworst case tightly. Once the table spills to DRAM the tails convergeinto the 460-560 ns band for the flat tables, where<code>r_h::unordered_flat_map</code> and<code>absl::flat_hash_map</code> are best at 10^7 (about 527 and 557ns). The node-based tables have the worst tails throughout(<code>absl::node_hash_map</code> 680 ns,<code>std::unordered_map</code> 795 ns at 10^7), since a single lookupcan miss the cache both on the bucket array and on the node it pointsto.</p><h5 id="kv-string-with-a-max-length-of-12-uint64_t-6">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-3">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-7">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-12-uint64_t-7">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h3 id="lookup-keys-not-in-the-table-miss-1">Lookup keys not in thetable (miss)</h3><h4 id="use-default-max_load_factor-4">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-8">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><p>The miss tail is where <code>fph::MetaFphMap</code>'s metadata paysoff most dramatically: its P99 stays extremely flat, only 16-35 ns from1,024 up to 200,000 elements and 59.5 ns at 1.2M, where every othertable has already climbed into the hundreds of nanoseconds. Because amiss is settled by a single metadata read, its worst case rarely needs asecond cache line, so the tail does not blow up until the table nolonger fits in DRAM-resident pages (482 ns at 10^7). The robin-hoodtables <code>tsl::robin_map</code> and <code>ska::flat_hash_map</code>show the opposite behaviour, jumping to about 412 ns already at 200,000elements because a worst-case miss walks a long probe run.<code>std::unordered_map</code> again has the heaviest tail (905 ns at10^7).</p><h5 id="kv-string-with-a-max-length-of-12-uint64_t-8">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-4">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-9">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-12-uint64_t-9">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h3 id="lookup-keys-with-a-50-probability-in-the-table-1">Lookup keyswith a 50% probability in the table</h3><h4 id="use-default-max_load_factor-5">Use default max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-10">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><p>With half the queries hitting, the tail is governed by the moreexpensive hit path: the flat SwissTable tables<code>r_h::unordered_flat_map</code> and<code>absl::flat_hash_map</code> keep the best P99 (about 23 and 22 nsat 1,024, rising to 527 and 534 ns at 10^7).<code>fph::MetaFphMap</code> no longer has the flat advantage it showedon pure misses, since the hit half of the workload still requires thebyte comparison and a possible extra cache line. The node-based<code>std::unordered_map</code> once more has the largest tail (850 nsat 10^7). The max-length variant and thelarge-<code>max_load_factor</code> appendix charts follow the samepattern.</p><h5 id="kv-string-with-a-max-length-of-12-uint64_t-10">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="use-large-max_load_factor-5">Use large max_load_factor</h4><h5 id="kv-string-with-a-fixed-length-of-12-uint64_t-11">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h5 id="kv-string-with-a-max-length-of-12-uint64_t-11">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h5><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><html><script>    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    var create_chart_funcs = [];    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/12-byte-string-lookup.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The 12 byte string lookup test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - Integer Iterate</title>
    <link href="https://renzibei.com/en/int-iterate/"/>
    <id>https://renzibei.com/en/int-iterate/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.220Z</updated>
    
    <content type="html"><![CDATA[<p>The integer key iteration test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/int-iterate.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we measure the performance of iterating over the hashtable.</p><p>Unlike lookup or insertion, iteration speed is dominated almostentirely by how a table stores its entries in memory, not by the hashfunction. There are three broad storage strategies among the tables wetest:</p><ul><li><strong>Dense array storage.</strong><code>ankerl::unordered_dense_map</code> and<code>emhash::hash_map7</code> keep all key-value pairs packed in acontiguous array and store only indices (or small metadata) in the hashslots. Iterating is then a linear scan over a dense array, so the costper element is small and, crucially, independent of the loadfactor.</li><li><strong>Inline open addressing.</strong><code>ska::flat_hash_map</code>, <code>ska::bytell_hash_map</code>,<code>tsl::robin_map</code>, <code>absl::flat_hash_map</code>,<code>fph::*</code> and <code>robin_hood::unordered_flat_map</code>store the entries directly in a sparse slot array. To iterate they mustwalk the whole slot array and skip the empty slots, so the work perelement grows as the table becomes emptier and the array grows largerthan the cache.</li><li><strong>Node-based storage.</strong> <code>std::unordered_map</code>and <code>absl::node_hash_map</code> allocate each entry in a separatenode. Iterating chases pointers between nodes that may be scatteredacross the heap, which is cheap while the nodes stay in cache butdegrades sharply once they spill to DRAM.</li></ul><h2 id="throughput">Throughput</h2><h3 id="kv-uint64_t-with-several-split-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><p>The charts confirm the picture above. On both platforms<code>ankerl::unordered_dense_map</code> is the fastest by a wide marginand is essentially flat across the whole size range — about 0.22 ns perelement on the Xeon E-2388G and about 2.0 ns on the M1 Max — because italways iterates a packed array regardless of how many slots the tablehas. <code>emhash::hash_map7</code> is the runner-up with the same flatbehaviour (about 0.6 ns on the Xeon and 2.6 ns on the M1).</p><p>Each inline open-addressing table behaves differently: itsper-element cost rises with the number of elements as the slot arraygrows past the cache. <code>ska::flat_hash_map</code>, for example,climbs from roughly 1 ns at small sizes to about 9-10 ns at 10^7elements on the Xeon, because most of the time is then spent readingempty slots from memory. The node-based <code>std::unordered_map</code>is the slowest at large sizes — around 35 ns per element at 10^7 on theXeon and 22 ns on the M1 — since iterating its node list becomes astream of cache-missing pointer dereferences.</p><p>A couple of combinations stop after the mid-size points.<code>absl::flat_hash_map</code> and <code>absl::node_hash_map</code>assume a well-distributed hash, but the integer <code>std::hash</code>is the identity function; on the masked-bit keys it collides heavily, soconstruction times out at large sizes and those data points are recordedas zero and not plotted.</p><p>The remaining integer distributions and the 56-byte value type, shownbelow, give the same ranking. The dense-storage tables stay flat andfastest; the only effect of the larger 56-byte slot is that theopen-addressing tables, which now move 64 bytes per slot, becomememory-bound a little earlier.</p><h3 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_iterate_chart"></canvas></div></div><h3 id="kv-uint64_t-with-high-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><h3 id="kv-uint64_t-with-low-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><h3 id="kv-uint64_t-uniformly-distributed-uint64_t">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><h2 id="latency">Latency</h2><p>The P99 latency of a single iteration step tells the same story fromthe tail end (latency is measured on the x86-64 platform only).<code>ankerl::unordered_dense_map</code> stays flat at about 1.6 nsregardless of size, because advancing the iterator over a dense arraynever misses far. The inline and node-based tables develop growing tailsas the backing storage outgrows the cache: at 10^7 elements the P99 stepreaches tens of nanoseconds for the open-addressing tables and hundredsof nanoseconds for the node-based <code>std::unordered_map</code> and<code>absl::node_hash_map</code> (about 374 ns and 439 ns respectively),with each spike corresponding to a cache miss on the next slot ornode.</p><h3 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-uint64_t-uniformly-distributed-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_iterate_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/int-iterate.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The integer key iteration test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - Integer Erase and Insert</title>
    <link href="https://renzibei.com/en/int-erase-insert/"/>
    <id>https://renzibei.com/en/int-erase-insert/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.220Z</updated>
    
    <content type="html"><![CDATA[<p>The integer key erase and insert test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/int-erase-insert.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we first construct a hash table with size N, and thenrepeat the following operations M times:</p><ol type="1"><li>Insert a new element into the hash table</li><li>Randomly erase an element from the hash table.</li></ol><p>It should be noted that the results of this test are also highlycorrelated with the distribution of the data, especially therelationship between deleted data and inserted data. Here we randomlyselect elements to delete with equal probability. In reality, elementsmay not be selected with equal probability; for example, the most likelydeleted element may be the most recently inserted element.</p><h2 id="throughput">Throughput</h2><p>We record the time spent in the whole process, which includes bothinsert and erase operations.</p><p>The y axis value is the average time per operation. This result isobtained by<code>time/op = (time for insert + time for erase) / (2 * M)</code>.This is the average time taken for insert and erase. This numberreflects the efficiency of making modifications to the hash table.</p><h3 id="kv-uint64_t-with-several-split-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;<a name="throughput-split-u64-u64"></a></h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><p>On Intel Rocket Lake, <code>ska::flat_hash_map</code> with<code>std::hash</code> is the fastest or near-fastest across almost thewhole range (about 5 to 37 ns per operation).<code>emhash::hash_map7</code> with <code>absl::Hash</code> and<code>absl::flat_hash_map</code> with <code>absl::Hash</code> are closebehind in the medium range. <code>tsl::robin_map</code> is fast at smallsizes and again at very large sizes (about 39 ns at 10^7), but it slumpsin the medium range (rising to 60 to 90 ns near 400,000 to 800,000)because of the poorer distribution of <code>robin_hood::hash</code> onthese key patterns.</p><p>On Apple M1 Max, the combination of <code>ska::flat_hash_map</code>and <code>std::hash</code> has a comparative advantage in almost everydata scale (about 6 to 33 ns).</p><p>It is also worth noting that on the M1 Max chip,<code>absl::node_hash_map</code> shows a large performance degradation,with a pronounced bump in the data range from about 45,000 to 100,000elements (peaking near 135 ns). <code>std::unordered_map</code> alsoexhibits performance degradation, climbing steadily and peaking around100,000 to 300,000 elements (up to about 600 ns). It is not clear whatthe cause of this degradation is. It may be related to the system'smemory allocation policy, as both hash tables require memory allocationand recycling operations for each insertion and deletion. Thisphenomenon is more pronounced on datasets with<code>&lt;uint64_t, 56 bytes&gt;</code>.</p><h3 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_erase_insert_time_chart"></canvas></div></div><p>On Intel Rocket Lake, the rankings are similar to those of <a href="#throughput-split-u64-u64">&lt;K,V&gt;: &lt;uint64_t with severalsplit bits masked, uint64_t&gt;</a>: <code>ska::flat_hash_map</code>with <code>std::hash</code> keeps a comfortable lead (about 8 to 59 ns),and <code>absl::flat_hash_map</code> is the closest follower.</p><p>On M1 Max, the relative performance of both<code>absl::flat_hash_map</code> and <code>emhash::hash_map7</code>increases a bit when the number of elements is in the range of roughly32,768 to 1,200,000, where <code>emhash::hash_map7</code> with<code>absl::Hash</code> actually edges ahead of<code>ska::flat_hash_map</code> at several points.</p><h2 id="latency">Latency</h2><p>We record the latency of insert and erase operations separately. Theinsert latency here is different from that in the "Insert and Construct"test. The latency statistics in the construct test include alloperations from size 0 to size N, while in this test the size of thehash table is always N or N + 1.</p><h3 id="insert-after-erase">Insert (after erase)</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-insert-latency">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt; InsertLatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><p>First consider P50 latency. When the data size is small, both<code>ska::flat_hash_map</code> with <code>std::hash</code> and<code>tsl::robin_map</code> with <code>absl::Hash</code> are in thefirst tier in terms of speed (about 7 to 9 ns). When the number ofelements is larger, <code>absl::flat_hash_map</code> and<code>absl::node_hash_map</code> with <code>absl::Hash</code> have agreater advantage: past roughly 300,000 elements the ska/tsl tables jumpup (to about 50 ns and beyond) while the absl tables stay near 24 to 28ns. In addition, the median latency of most open-addressed hash tablesconverges again as the number of elements approaches 10^7 (around 92 to99 ns).</p><p>For P99 latency, <code>emhash::hash_map7</code> with<code>absl::Hash</code> has the smallest tail latency through small andmedium sizes (about 31 ns at small counts, staying ahead up to roughly200,000 elements). When the number of elements is larger than that,<code>absl::flat_hash_map</code> comes out on top.</p><p>Another point that has to be mentioned about modifying open-addressedhash tables is the tombstone mechanism used in the implementation. Forsome hash tables, when a delete operation is performed, a special marker(tombstone) is placed on the slot where the deleted element was located.A tombstone marker is not the same as an empty marker. If a hash tablehas too many tombstone markers, its lookup performance will be affected.Therefore, some hash tables will rehash when the number of tombstonesreaches a certain percentage. This can give an insert operation mixedwith erase a poor maximum latency. The P100 latency helps show this.</p><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P100_latency_chart"></canvas></div></div><p>In addition to the element counts that are powers of 2 (where thefirst insertion may lead to expansion), some hash tables have P100latency at some data points proportional to the number of elements. Thisphenomenon is observed for both the absl-series hash tables and<code>robin_hood::unordered_flat_map</code>. These hash tables shouldnot be selected if the user requires strict maximum latency formodification operations.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-insert-latency">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt; InsertLatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><p>When the size of <code>value_type</code> becomes 64 bytes, theadvantage of <code>ska::flat_hash_map</code> with <code>std::hash</code>over <code>tsl::robin_map</code> increases when the amount of data issmall (about 8 ns vs 11 ns at the smallest sizes for P50).</p><p>For P99 latency, at small element counts the smallest tail belongs to<code>absl::flat_hash_map</code> with <code>absl::Hash</code> (about 33ns), ahead of <code>ska::flat_hash_map</code> and<code>emhash::hash_map7</code>.</p><h3 id="erase">Erase</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-erase-latency">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt; EraseLatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><p>For P50 latency, <code>ska::flat_hash_map</code> with<code>std::hash</code> almost always has the smallest latency, except inthe range from about 300,000 to 800,000 elements. In that range<code>ska::flat_hash_map</code> jumps up (to about 100 to 160 ns)because its aggressive expansion and low load factor push it into aslower memory tier, while <code>absl::flat_hash_map</code> and<code>robin_hood::unordered_flat_map</code> stay lower (about 45 to 120ns) and perform relatively better. Above 1,200,000 elements the tablesconverge again.</p><p>For P99 latency, <code>emhash::hash_map7</code> with<code>absl::Hash</code> performs the fastest when the number of elementsis small (about 19 ns, leading up to roughly 3,000 elements). The taillatency of <code>absl::flat_hash_map</code> with <code>absl::Hash</code>is smaller in the medium range, roughly from 8,000 to 200,000 elements.When the number of elements is larger, the performance of many hashtables is close.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-erase-latency">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt; EraseLatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_erase_P99_latency_chart"></canvas></div></div><p>When the size of the <code>value_type</code> is 64 bytes, it isbasically the same as <code>&lt;uint64_t, uint64_t&gt;</code>.</p><h2 id="throughput-appendix">Throughput Appendix</h2><h3 id="kv-uint64_t-with-high-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><h3 id="kv-uint64_t-with-low-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><h3 id="kv-uint64_t-uniformly-distributed-uint64_t">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><h2 id="latency-appendix">Latency Appendix</h2><h3 id="insert-after-erase-1">Insert (after erase)</h3><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-insert-latency">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt; InsertLatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-insert-latency">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt; InsertLatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-insert-latency">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt; Insert Latency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><h3 id="erase-1">Erase</h3><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-erase-latency">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt; EraseLatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-erase-latency">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt; EraseLatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-erase-latency">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt; Erase Latency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    function create_all_charts() {        for (var i = 0; i < create_chart_funcs.length; i++) {            create_chart_funcs[i]();        }    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_erase_insert_time_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_create);create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_erase_insert_time_create);create_chart_funcs.push(M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_create);create_chart_funcs.push(M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_erase_insert_time_create);create_chart_funcs.push(M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_create);create_chart_funcs.push(M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_erase_insert_time_create);create_chart_funcs.push(M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_erase_insert_time_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P100_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_insert_after_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_insert_after_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_insert_after_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_insert_after_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_insert_after_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_erase_P99_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_erase_P50_latency_create);create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_erase_P99_latency_create);    }       function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/int-erase-insert.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The integer key erase and insert test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - Analysis and Conclusion</title>
    <link href="https://renzibei.com/en/analysis-and-conclusion/"/>
    <id>https://renzibei.com/en/analysis-and-conclusion/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.219Z</updated>
    
    <content type="html"><![CDATA[<p>A full walk-through of what the benchmark showed — across bothinteger and string keys, small and large values — and how to turn itinto a concrete choice of hash table and hash function for a particularworkload.</p><span id="more"></span><h2 id="no-single-winner">No single winner</h2><p>The benchmark covers insertion, erasure, lookup (successful,unsuccessful, and a 50% mix), and iteration. It runs them on integerkeys of several distributions and on strings of 12, 24 and 64 bytes,with both an 8-byte value and a 56-byte value (so the stored pair is 16or 64 bytes), at sizes from 32 up to 10^7 elements, on two verydifferent machines — an Intel Xeon E-2388G (Rocket Lake) and an Apple M1Max. The one lesson that survives all of that variety is that thefastest combination is never the same twice: it changes with theoperation that dominates the workload, with the type and distribution ofthe keys, with the size of the value, with how large the table isrelative to the cache, and with the platform.</p><p>A second lesson, worth stating up front, is that a hash table and ahash function must be chosen together. Some tables assume the hashalready spreads keys uniformly and break down badly if it does not;others do their own mixing and prefer the cheapest possible hash. Everychart therefore compares <em>table + hash</em> pairs rather than tablesin isolation.</p><h2 id="lookups">Lookups</h2><p>Lookups are usually the operation people care about most, and theyare where the design of the table matters most. A lookup breaks intothree parts: computing the hash, mapping it to a slot, and loading andcomparing keys. Which part dominates depends on the key type and on howmuch of the table fits in cache.</p><h3 id="integer-keys">Integer keys</h3><p>For integer keys a lookup's time is split between computing andmapping the hash and the memory access that follows, and which onedominates depends on how much of the table sits in cache. The hash isnot necessarily cheap: the identity <code>std::hash</code> costs nothingand is fine when the keys are already spread out evenly, but keys whoseinformation sits in only a few bit positions — as with pointers andaligned addresses (whose low bits are always zero) or small sequentialIDs (whose high bits are always zero) — collide badly under the identityhash and need a real mixing hash such as <code>absl::Hash</code> (a128-bit multiply and xor-shift) to scatter them across the table. Theright hash therefore depends on the keys (discussed further below). Theranking moves through three regimes as the table grows. While it fits inthe L1/L2 cache the memory fetch is fast — only a few nanoseconds — sothe per-probe instruction count (the hash and the slot mapping) iscomparable to the fetch and is what separates the tables; the leanestcombination wins, and <code>ska::flat_hash_map</code> with the identity<code>std::hash</code> is fastest at small sizes (about 1.3 ns per hitat 1,024 elements on both machines), with<code>fph::DynamicFphMap</code> a close second on the M1, trailing it byonly about 0.1 ns. In the middle of the range, where the table lives inthe L2/L3 caches, <code>fph::DynamicFphMap</code> takes the lead (about3.4 ns at 200,000 and 10.9 ns at 1.2M on the Xeon) because its boundedprobe count keeps cache-line touches low. At the largest sizes, whereevery lookup misses the cache and the memory access dominates,<code>ska::flat_hash_map</code> is fastest again (about 14.7 ns on theXeon, 9.3 ns on the M1 at 10^7) because the table that reads the fewestcache lines — a single compact slot array — wins regardless of thehash.</p><p>Misses are more decisive. To prove a key absent, a table must ruleout every slot it could occupy, and tables that store a byte of metadataper slot can do so without touching the full keys.<code>fph::MetaFphMap</code> is in a class of its own: it rejects anabsent key with essentially one memory access, giving it the bestaverage miss time from about 6,000 elements up and a far tighter tail —at 1.2M elements its P99 miss latency is about 34 ns, against about106-111 ns for <code>r_h::unordered_flat_map</code> /<code>absl::flat_hash_map</code> and 440-465 ns for the probe-walking<code>ska::flat_hash_map</code> and <code>tsl::robin_map</code>. Theadvantage lasts until the metadata array itself leaves the L3 cache(around 10^7 here), after which it, too, pays one DRAM access.<code>absl::flat_hash_map</code> is the best of the conventional tableson misses, having been built around the same metadata idea. The 50%-hitcase is roughly the average of the two, with the alternating outcomeadding a branch-misprediction penalty that narrows every gap.</p><h3 id="string-keys">String keys</h3><p>String keys change the picture because hashing costs much more — thewhole string must be hashed and compared, rather than a single 64-bitword — and because longer strings may live on the heap. Two effectsstand out.</p><p>First, the byte-oriented <code>xxHash_xxh3</code> becomes the besthash for almost every table, and increasingly so as strings lengthen:hashing dominates, so a fast bytes hash matters more than the table.Second, the in-cache lookup floor rises sharply with length. Asuccessful in-cache lookup costs about 1.3 ns for an integer key, about6.5 ns for a 12-byte string, about 13 ns for a fixed 24-byte string, andabout 16 ns for a 64-byte string on the Xeon. The jump between 12 and 24bytes is mostly the Small String Optimization (SSO): a string of up to~15 bytes is stored inline in the <code>std::string</code> object, whilea longer one is heap-allocated, so a lookup must follow a pointer to aseparate cache line to compare the characters. This shows up directly inthe data — the <em>max</em>-length-24 keys, most of which are shortenough to stay inline, are about twice as fast as the<em>fixed</em>-length-24 keys, which are all heap-allocated (about 7 nsversus 13 ns at small sizes).</p><p>Apart from the higher floor, the ranking tracks the integer case: theperfect-hash tables lead in cache (for 64-byte keys<code>fph::DynamicFphMap</code> and <code>fph::MetaFphMap</code> with<code>xxHash_xxh3</code> are fastest for hits), the robin-hood tables<code>tsl::robin_map</code> and <code>ska::flat_hash_map</code> overtakeonce memory-bound, and <code>fph::MetaFphMap</code> again dominatesmisses (about 8 ns at 1,024 and 60 ns at 1.2M for 64-byte keys, versusabout 63-68 ns for the next tables).</p><h3 id="the-effect-of-the-value-size">The effect of the value size</h3><p>Enlarging the value from 8 to 56 bytes (a 64-byte pair) makes eachslot four times bigger, so fewer entries fit in each cache level andevery table becomes memory-bound earlier; a hit that cost about 15 ns at10^7 with an 8-byte value costs about 21 ns with the 56-byte value. Thisshifts the balance toward tables that touch the fewest cache lines:<code>fph::DynamicFphMap</code>, with its bounded probe count, leadsacross more of the range with the large value than it does with thesmall one. If the values are large, weight the mid- and large-sizeresults more heavily than the small-size ones.</p><h2 id="building-and-modifying-the-table">Building and modifying thetable</h2><p>Insertion inverts the lookup ranking for the perfect-hash tables. Theflat tables — <code>absl::flat_hash_map</code>,<code>ska::flat_hash_map</code>, <code>emhash::hash_map7</code>,<code>tsl::robin_map</code>, <code>ankerl::unordered_dense_map</code> —are fastest to fill (about 4-6 ns per insert at small sizes and 25-35 nsat 10^7 with capacity reserved), while <code>std::unordered_map</code>is far slower (near 125 ns at 10^7) because it allocates a node perelement. The perfect-hash tables are slowest by a wide margin: buildinga perfect hash costs <code>fph::DynamicFphMap</code> and<code>fph::MetaFphMap</code> roughly 1,450-1,900 ns per element at 10^7on the Xeon, about 12-15 times <code>std::unordered_map</code> (and11-12 times on the M1). Dropping the <code>reserve</code> call roughlydoubles every table's insert time because growth then triggers repeatedrehashing.</p><p>The erase-insert test, which alternates one erase and one insert tohold the size constant, favours the flat tables(<code>ska::flat_hash_map</code>, <code>tsl::robin_map</code>,<code>absl::flat_hash_map</code>); open-addressing tables leavetombstones behind erased entries, and the perfect-hash tables fareworst, timing out at the largest sizes because they cannot cheaplyabsorb churn.</p><p>Key type and value size matter here too. With string keys, everyinsert and erase also pays the string hash, and for strings beyond theSSO limit a heap allocation and free for the characters — so the gapbetween 12-byte and 64-byte string workloads is large, while integerinserts are dominated purely by table mechanics. A larger value, as withlookups, pushes the memory-bound regime earlier.</p><h2 id="iteration">Iteration</h2><p>Iteration is decided almost entirely by storage layout, independentof the hash function and nearly independent of the key type (theiterator visits fixed-size table slots; it does not re-hash or, forinline-stored entries, dereference the keys).<code>ankerl::unordered_dense_map</code> and<code>emhash::hash_map7</code> keep their entries in a dense, contiguousarray and iterate in near-constant time per element regardless of loadfactor — about 0.22 ns on the Xeon and 2.0 ns on the M1, flat across thewhole size range. The inline open-addressing tables must scan a sparseslot array and skip empty slots, so their cost rises with size; thenode-based <code>std::unordered_map</code> is slowest, chasingcache-missing pointers (about 35 ns per element at 10^7 on the Xeon).The perfect-hash tables get no special benefit, since they also iteratea sparse layout.</p><h2 id="memory-and-load-factor">Memory and load factor</h2><p>Footprint, measured on integer key-value pairs, splits the tablesinto three groups at 10^7 elements. The one-metadata-byte SwissTables<code>absl::flat_hash_map</code> and <code>ska::bytell_hash_map</code>are most compact at about 272 MB; the node-based<code>std::unordered_map</code> and <code>absl::node_hash_map</code>come next (about 308 MB and 297 MB); the <code>fph</code> tables useroughly twice the most compact (about 556-572 MB) for the index thatspeeds their lookups; and <code>ska::flat_hash_map</code> and<code>tsl::robin_map</code> are largest at about 768 MB, because theykeep a low maximum load factor and store the full, alignment-paddedvalue in every slot.</p><p>The load-factor chart explains that. Open-addressing tables grow bydoubling, so occupancy sawtooths between roughly 0.4 and 0.9;<code>ska::flat_hash_map</code> and <code>tsl::robin_map</code> settleto the lowest values (about 0.30 at large sizes), buying speed withempty space, while <code>std::unordered_map</code> runs near 1.0 (a nodeper element, no empty slots). Raising <code>max_load_factor</code> packsthe flat tables tighter — <code>ska::flat_hash_map</code> rises fromabout 0.30 to 0.60 at 10^7, roughly halving its empty space — at thecost of longer probes. Because footprint decides when a table crosseseach cache boundary, this is the structural reason behind the cachetiers in the lookup results: a bulkier table falls out of cache at asmaller element count.</p><h2 id="the-hash-function-matters-as-much-as-the-table">The hashfunction matters as much as the table</h2><p>For integer keys <code>std::hash</code> is the identity function. Atable that does its own mixing — most notably<code>ska::flat_hash_map</code> — can use it safely and enjoy its zerocost. But keys that are not uniformly random — those whose informationlives in only a handful of bit positions, like pointers or smallsequential IDs — make tables that assume a uniform hash(<code>absl::flat_hash_map</code>, <code>emhash::hash_map7</code>,<code>tsl::robin_map</code>) collide catastrophically with the identityhash, badly enough that some combinations time out during construction,so they need a good mixing hash like <code>absl::Hash</code>. Even agood hash can have weak spots: <code>robin_hood::hash</code> spreadssome such key patterns poorly, producing an irregular mid-range slumpfor <code>tsl::robin_map</code> with that hash. For string keys,<code>xxHash_xxh3</code> wins across the board.</p><h2 id="platform-differences-rocket-lake-vs-m1-max">Platformdifferences: Rocket Lake vs M1 Max</h2><p>The two machines have very different memory systems. The Xeon E-2388G(Rocket Lake) has a 48 KB L1, 512 KB L2 and 16 MB L3 with 4 KB pages;the M1 Max has a 128 KB L1, 12 MB L2 and 48 MB system-level cache with16 KB pages. The larger caches and pages on the M1 push every cache-tierboundary to the right and reduce TLB pressure, so although the<em>ranking</em> of tables is broadly consistent across platforms, thesizes at which one overtakes another differ. One quirk: the libc++<code>std::unordered_map</code> on the M1 indexes buckets with a modulothat degenerates to a power-of-two mask when the bucket count is a powerof two, discarding the high bits of the hash, which makes thatcombination anomalously slow at element counts that are exact powers oftwo.</p><h2 id="choosing-a-hash-table">Choosing a hash table</h2><p>Two practical questions decide most of the choice: <strong>how oftenis the table written versus read</strong>, and <strong>how big is itrelative to the cache it actually gets</strong>.</p><p>On the first question: if the table is built once and then queriedfar more than it is modified, the perfect-hash<code>fph::DynamicFphMap</code> (or <code>fph::MetaFphMap</code> whenmisses are common) gives the best lookup performance — at the cost of aslow build, an inability to absorb frequent updates, and roughly twicethe memory of the most compact tables, so it suits a static dictionary,a read-mostly index or a membership set far better than a constantlychanging table. If inserts, erases and lookups are mixed, a flat tableis the better all-rounder, and <code>absl::flat_hash_map</code> with<code>absl::Hash</code> (or <code>xxHash_xxh3</code> for strings) is astrong, compact, distribution-robust default.</p><p>The second question is where "fast in cache" needs unpacking. A tableis "in cache" when its whole footprint — not the element count alone —fits in a cache level: on the Xeon, roughly 16,000 small (16-byte)entries fit in the L2 and about a million in the L3, and proportionallyfewer when the value is large or the keys are heap-allocated strings.Therefore, <code>ska::flat_hash_map</code>, the fastest table whileeverything stays resident, is the right pick mainly for genuinely smalltables, or when the memory can be spared. The catch is that<code>ska::flat_hash_map</code> also has the <em>largest</em> footprintof all (its low load factor is exactly what makes it fast), so for thesame number of elements it leaves cache sooner than a compact table, andit competes harder for whatever cache is left to the rest of theprogram. In a real application the cache is shared with everything elserunning, so the effective threshold is well below the raw cache size; ifmemory is tight, the cache is contended, or the table is large, thecompact <code>absl::flat_hash_map</code> or<code>ska::bytell_hash_map</code> will usually beat it despite<code>ska::flat_hash_map</code>'s edge in a benchmark that owns thewhole cache. For iteration-heavy work,<code>ankerl::unordered_dense_map</code> is the clear choice regardlessof size.</p><div class="markdown-table-div"><table><colgroup><col style="width: 44%"><col style="width: 55%"></colgroup><thead><tr><th>Your situation</th><th>Consider</th></tr></thead><tbody><tr><td>build once, then look up a lot (read-mostly / static)</td><td><code>fph::DynamicFphMap</code> (hit-heavy) or<code>fph::MetaFphMap</code> (miss-heavy)</td></tr><tr><td>general-purpose, mixed insert / erase / lookup</td><td><code>absl::flat_hash_map</code> (with <code>absl::Hash</code>, or<code>xxHash_xxh3</code> for strings) — fast, compact, robust</td></tr><tr><td>small table, or plenty of spare and uncontended cache, want lowestlookup time</td><td><code>ska::flat_hash_map</code> — but it is the largest table, soprefer a compact one once it grows</td></tr><tr><td>memory tight, cache shared/contended, or table large</td><td><code>absl::flat_hash_map</code> or<code>ska::bytell_hash_map</code> (most compact)</td></tr><tr><td>dominated by iteration / frequent full scans</td><td><code>ankerl::unordered_dense_map</code></td></tr><tr><td>need reference / pointer stability</td><td><code>absl::node_hash_map</code> or<code>std::unordered_map</code></td></tr></tbody></table></div><h2 id="final-advice">Final advice</h2><p><code>std::unordered_map</code> is the slowest table in almost everythroughput test because of its node-per-element layout, so it is worthkeeping mainly when its iterator and pointer-stability guarantees areactually required. Beyond that, the table above is a starting point, nota verdict: the full space of tables, hashes, key distributions, valuesizes, table sizes and platforms is enormous, and a real workload mayland between the cases measured here. The surest answer is always tobenchmark the two or three most promising candidates on the actual dataand operation mix.</p><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;A full walk-through of what the benchmark showed — across both
integer and string keys, small and large values — and how to turn it
into a concrete choice of hash table and hash function for a particular
workload.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - Memory Usage and Load Factor</title>
    <link href="https://renzibei.com/en/memory-usage-and-load-factor/"/>
    <id>https://renzibei.com/en/memory-usage-and-load-factor/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.222Z</updated>
    
    <content type="html"><![CDATA[<p>This page discusses the memory usage and load factor of hashtables.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/memory-usage-and-load-factor.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>During the previous tests, we recorded the heap memory size and loadfactor. We count the heap memory size by implementing a custom<code>Allocator</code>, which counts the allocated bytes during the<code>allocate()</code> function call. However, the accuracy of thecounted number depends on the hash table container correctly using<code>Allocator</code>; that is, all memory allocations must go throughthe allocator. Some hash tables, such as <code>emhash::HashMap7</code>,will not get accurate memory data because <code>Allocator</code> is notfully used for all heap memory allocations.</p><p>One of the troublesome aspects of C++ design is that classes that usedifferent Allocator template parameters belong to different types (whichis what the <code>std::pmr</code> container is trying to solve). Forexample, <code>std::basic_string</code> using a<code>std::allocator</code> and <code>std::basic_string</code> using acustom allocator are two types, and hash functions like<code>std::hash</code> are usually only compatible with<code>std::string</code> using <code>std::allocator</code>. Therefore,it is not possible to use these hash functions directly for strings thatuse other allocators, and our method of counting heap memory size cannotcount the heap memory of strings. As a result, we only count cases whereboth keys and values are integer types.</p><p>It needs to be clarified that if the user doesn't care about thetotal heap memory size but cares about the warm memory size related tothe query speed, then the data in this test cannot accurately reflectthe cache size that the hash table needs to utilize. Some hash tablesrequire some extra space for non-query work, such as insertion, and thispart of the memory space is not accessed during lookup operations.</p><p>The set of element counts used in this test is different from othertest items. To reflect the ability to cope with different load factors,the number of elements is chosen as 0.4 times or 0.6 times a power of 2,e.g.<code>0.4 x 2^10, 0.6 x 2^10, 0.4 x 2^11, 0.6 x 2^11, ...</code></p><h2 id="lookup-related-cache-size-overhead-analysis">Lookup-relatedCache Size Overhead Analysis</h2><p>The heap memory a table occupies and the load factor it runs atdirectly shape its lookup speed, because together they decide how muchmemory a lookup must stream through and therefore how soon the workingset spills out of each cache level. A lower load factor wastes moreslots but shortens probe chains; a smaller per-element footprint fitsmore elements into the same cache. The cache-tier boundaries seen in the<a href="/en/int-lookup-throughput/" title="lookup throughput">lookup throughput</a> test are exactly the points where a table's footprintcrosses the L1, L2 and L3 capacities, so the two charts below are thestructural explanation behind those tiers.</p><p>Because these figures mostly describe the data structures themselvesrather than the processor, the heap-memory numbers are mostlyCPU-independent. Exact totals can still differ across STLimplementations and allocators. The memory-size charts below thereforeshow the measured Intel and M1 Max values separately; treat them asstructure-driven but implementation-sensitive.</p><h2 id="heap-memory-size">Heap Memory Size</h2><h3 id="memory-size-when-using-default-max_load_factor">Memory size whenusing default max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2286G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___heap_memory_size_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___heap_memory_size_chart"></canvas></div></div><p>The footprint splits the tables into three groups. TheSwissTable-style <code>absl::flat_hash_map</code> and<code>ska::bytell_hash_map</code>, which spend only one metadata byteper slot, are the most compact — about 272 MB for 10^7 integer pairs.The node-based <code>std::unordered_map</code> and<code>absl::node_hash_map</code> come next (about 308 MB and 297 MB),paying for a per-node allocation and its pointer. The perfect-hash<code>fph::DynamicFphMap</code> and <code>fph::MetaFphMap</code> useroughly twice the most compact tables (about 556-572 MB), the cost ofthe extra index and metadata that make their lookups fast. Largest ofall are <code>ska::flat_hash_map</code> and <code>tsl::robin_map</code>at about 768 MB, because they keep a low maximum load factor and storethe full, alignment-padded value type in every slot.<code>emhash::hash_map7</code> is shown as zero because it does notroute all of its allocations through the custom counting allocator, soits memory cannot be measured here (as noted above).</p><h3 id="memory-size-when-using-large-max_load_factor">Memory size whenusing large max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2286G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___heap_memory_size_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___heap_memory_size_large_max_load_factor_chart"></canvas></div></div><h2 id="load-factor">Load factor</h2><h3 id="load-factor-when-using-default-max_load_factor">Load factor whenusing default max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___load_factor_with_default_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___load_factor_with_default_max_load_factor_chart"></canvas></div></div><p>The load-factor chart explains part of that footprint. Mostopen-addressing tables grow by doubling, so their load factor sawtoothsbetween roughly 0.4 right after a growth and 0.75-0.9 just before thenext one; the test samples sizes at 0.4x and 0.6x powers of two, whichis why the visible values cluster around 0.5-0.76.<code>ska::flat_hash_map</code> and <code>tsl::robin_map</code> with<code>robin_hood::hash</code> settle to the lowest load factors (about0.29-0.30 at large sizes), trading memory for shorter probe chains — thesame low occupancy that helps their lookup speed. At the other extreme,<code>std::unordered_map</code> runs at a load factor close to 1.0,because a node is allocated per element and there are no empty slots toleave slack; its memory cost lives in the nodes and pointers rather thanin spare slots. Raising <code>max_load_factor</code> packs the flattables more tightly — at 10^7 elements <code>ska::flat_hash_map</code>'sload factor rises from about 0.30 to about 0.60, roughly halving itsempty space — at the price of longer probe sequences and slower lookups,the tradeoff explored in the lookup tests.</p><h3 id="load-factor-when-using-large-max_load_factor">Load factor whenusing large max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___load_factor_with_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___load_factor_with_large_max_load_factor_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {        create_chart_funcs.push(async() => {Xeon_E_2286G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___heap_memory_size_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___heap_memory_size_create();});        create_chart_funcs.push(async() => {Xeon_E_2286G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___heap_memory_size_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___heap_memory_size_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___load_factor_with_default_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___load_factor_with_default_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___load_factor_with_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___load_factor_with_large_max_load_factor_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/memory-usage-and-load-factor.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;This page discusses the memory usage and load factor of hash
tables.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - Integer Lookup Latency</title>
    <link href="https://renzibei.com/en/int-lookup-latency/"/>
    <id>https://renzibei.com/en/int-lookup-latency/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.221Z</updated>
    
    <content type="html"><![CDATA[<p>The integer key lookup latency test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/int-lookup-latency.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we measure the lookup latency of hash tables in threekinds of situations:</p><ol type="1"><li>Look up the keys in the hash table (hit or successful find).</li><li>Look up the keys not in the hash table (miss or unsuccessfulfind).</li><li>Look up keys with a 50% probability of being in the hash table.</li></ol><p>Latency tells a different story from throughput. The throughput testkeeps many independent lookups in flight, so the CPU's out-of-orderengine overlaps their memory accesses. The P99 latency below is insteadthe 99th-percentile time of a <em>single</em> lookup, so it captures theworst accesses that cannot be hidden — a lookup that misses severalcache lines, walks a long probe sequence, or mispredicts a branch.Latency is measured on the x86-64 platform (Xeon E-2388G) only.</p><p>Two regimes dominate every chart. While the table still fits incache, the tail is set by how many memory accesses the worst-case lookupperforms: tables that bound their probe count (the perfect-hash<code>fph::*</code> maps) or reject a key quickly through metadata keepa tight tail, while tables that may walk a long probe chain undercollisions show a heavier one. Once the table spills out of the L3cache, almost every 99th-percentile lookup incurs at least one DRAMaccess, so the tail flattens onto a memory-latency floor of roughly420-460 ns on this Rocket Lake platform and the choice of table mattersfar less for the tail than it does for throughput. In each group thesecond chart sets a large <code>max_load_factor</code>, which packs thetables more tightly and slightly lengthens the probe chains, but leavesthe qualitative ordering unchanged.</p><h2 id="lookup-keys-in-the-table-hit">Lookup keys in the table(hit)</h2><h3 id="use-default-max_load_factor">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><p>For hit lookups the ordering shifts with size. At small sizes thesimple open-addressing tables have the tightest tail —<code>ska::flat_hash_map</code> with <code>std::hash</code> reaches aP99 of about 7.8 ns at 1,024 elements. In the L2/L3 regime theperfect-hash tables take over: at 32,768 elements<code>fph::DynamicFphMap</code> and <code>fph::MetaFphMap</code> with<code>std::hash</code> have the best P99 (about 22 ns), because theyguarantee a small, bounded number of probes even in the worst case.Beyond about one million elements every table converges to the ~420-460ns DRAM floor; the node-based <code>std::unordered_map</code> is theclear outlier, since a tail lookup chases two cache-missing pointers andreaches 655-765 ns.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h3 id="use-large-max_load_factor">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h2 id="lookup-keys-not-in-the-table-miss">Lookup keys not in the table(miss)</h2><h3 id="use-default-max_load_factor-1">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><p>The miss case is where the metadata design shines most.<code>fph::MetaFphMap</code> can confirm that a key is absent by readinga single metadata cache line, without walking any probe sequence. Aslong as that metadata array fits in cache this gives it a dramaticallytighter tail than any other table: at 1,200,000 elements its P99 misslatency is about 34 ns, while the other tables range from roughly 106 ns(<code>r_h::unordered_flat_map</code>, <code>absl::flat_hash_map</code>)up to 440-460 ns (<code>ska::flat_hash_map</code>,<code>tsl::robin_map</code>) — a more-than-tenfold gap at the tail. Theadvantage is largest in the L3 regime and disappears at 10,000,000elements, where the metadata array itself no longer fits in the L3cache, so even a single metadata read becomes a DRAM miss and<code>fph::MetaFphMap</code> falls back to the common ~450 ns floor.This is exactly the workload <code>fph::MetaFphMap</code> was designedfor, and the main reason to prefer it over<code>fph::DynamicFphMap</code> when unsuccessful lookups arecommon.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-2">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_chart"></canvas></div></div><h3 id="use-large-max_load_factor-1">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-3">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_chart"></canvas></div></div><h2 id="lookup-keys-with-a-50-probability-in-the-table">Lookup keys witha 50% probability in the table</h2><h3 id="use-default-max_load_factor-2">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-4">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><p>The 50% case mixes hits and misses, and the alternating outcome addsa branch-misprediction penalty that narrows the gaps between tables. Theperfect-hash maps still hold a mild tail advantage in the in-cacheregime (<code>fph::DynamicFphMap</code> is around 27 ns at 32,768elements), but once the working set leaves the cache the memory-latencyfloor dominates again and the tables become hard to separate, with onlythe node-based tables clearly behind (<code>absl::node_hash_map</code>at about 520-620 ns and <code>std::unordered_map</code> at about 675-785ns at the largest sizes).</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-4">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-4">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-4">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-4">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_chart"></canvas></div></div><h3 id="use-large-max_load_factor-2">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-5">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-5">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-5">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-5">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-5">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_miss_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_miss_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_miss_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_default_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_lookup_50percent_hit_large_max_load_factor_P99_latency_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/int-lookup-latency.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The integer key lookup latency test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - Integer Lookup Throughput</title>
    <link href="https://renzibei.com/en/int-lookup-throughput/"/>
    <id>https://renzibei.com/en/int-lookup-throughput/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.221Z</updated>
    
    <content type="html"><![CDATA[<p>The integer key lookup throughput test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/int-lookup-throughput.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to view or hide data linescorresponding to specific hash tables and hash functions.</strong></p><p>In this test, we measure the lookup performance of hash tables inthree kinds of situations:</p><ol type="1"><li>Look up the keys in the hash table (hit or successful find).</li><li>Look up the keys not in the hash table (miss or unsuccessfulfind).</li><li>Look up keys with a 50% probability of being in the hash table.</li></ol><h2 id="exploring-the-connection-between-lookup-performance-and-memory-hierarchy">Exploringthe Connection between Lookup Performance and Memory Hierarchy</h2><p>Before looking at the details of lookup speed, it is important tonote the strong correlation between hash table lookup speed and thecache hit rate. With integer key-value pairs, the hash table lookupoperation mainly involves loading content from memory, which consumesmost of the operation time.</p><p>Modern computer systems use a hierarchical memory design. Forinstance, Intel Rocket Lake has registers, L1 cache, L2 cache, L3 cache,and DRAM, with speed decreasing in that order. The M1 Max has L1 cache,L2 cache, SLC cache, and DRAM.</p><p>As the cache miss rate at a given level rises, the overall lookuptime becomes limited by the memory hardware speed of the next, slowerlevel. Besides cache misses, TLB misses also contribute to the timepenalty. The chart below helps illustrate this concept. It shows the P50(median) hit-lookup latency on the Xeon E-2388G. The structure iseasiest to read by isolating a single open-addressing table such as<code>ska::flat_hash_map</code> with <code>std::hash</code> (click theother legend entries to hide them). For such a table the median lookupsucceeds after a single key comparison, so the P50 value mostly reflectsthe latency of one memory load and tracks the memory hierarchyclosely.</p><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P50_latency_chart"></canvas></div></div><p>The figure above shows that on the Intel Rocket Lake architecture,lookup performance separates into four tiers, each determined by thenumber of elements:</p><ol type="1"><li>Elements that fully fit into the L1 cache. Given asizeof(value_type) of 16 bytes and an L1 data cache of 48 KB, this canhold approximately 3,072 elements. Since the default load factor of<code>ska::flat_hash_map</code> is usually less than 0.5, this range inthe figure lies between 32 and 1,024.</li><li>Elements that fully fit into the L2 cache. Given asizeof(value_type) of 16 bytes and an L2 cache of 512 KB, this can holdapproximately 32,768 elements. For <code>ska::flat_hash_map</code> inthis test, the range is 1,500 to 8,192.</li><li>Elements that fully fit into the L3 cache. Given asizeof(value_type) of 16 bytes and an L3 cache of 16 MB, this can holdapproximately 1,048,576 elements. For <code>ska::flat_hash_map</code> inthis test, the range is 12,000 to 400,000.</li><li>The case where reading from RAM becomes inevitable. This typicallyoccurs when the L3 cache can't hold all the elements, generally whenthere are more than 1,048,576 elements.</li></ol><p>In addition to cache misses, TLB misses also have a significantinfluence on hash table lookup speed. Rocket Lake has 64 L1 DTLB (in 4KBmode, 32 in 2MB mode) entries and 1536 STLB entries. This means if a 4KBpage size is used, considerable L1 TLB misses will occur when the numberof elements exceeds 16,384, and L2 TLB misses when the number exceeds393,216. Using huge pages can greatly reduce TLB misses. On macOS withM1 Max, a 16 KB page size is used by default, resulting in much smallerpenalties due to TLB misses compared to the default 4 KB pages on theRocket Lake platform. In the P50 curve this TLB-miss penalty adds anextra rise once the working set pushes the page table beyond the reachof the L2 TLB, on top of the cache-tier steps described above.</p><p>The M1 Max can utilize 128KB L1 data cache, 12MB L2 cache, and 48MBSLC cache per thread, resulting in a higher cache hit rate and thereforesuperior performance for most data sizes in the hash table test.</p><h2 id="lookup-keys-in-the-table-hit">Lookup keys in the table(hit)</h2><h3 id="use-default-max_load_factor">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><p>Broadly speaking, the lookup time for a hash table can be split intofour parts:</p><ol type="1"><li>Time required for the hash function to compute the hash value.</li><li>Time needed to map the hash value to a specific memory address.</li><li>Time taken to load content from the given memory address.</li><li>Time spent comparing the loaded content with the target key value,with additional penalty time if the comparison is unsuccessful.</li></ol><p>When the element count is low, the CPU cache can often hold allelements. Here, the third step is relatively fast, so the other threecomponents become important. The first and second steps can trade workwith each other. If the hash function (in the first step) does notdistribute elements well across the hash space, the second step needsextra computation to distribute these non-uniform hash values into theunderlying slot array. If the first step uses a high-quality hashfunction that uniformly distributes hash values, the second steptypically only needs a quick bitwise AND instruction for truncation (ifthe number of slots is a power of 2).</p><p>This can be seen from the charts above. Both libc++ and libstdc++'simplementations of <code>std::hash</code> use the identity hash foruint64_t, making the first step extremely fast. If a bitwise ANDinstruction is directly used to get the slot-array index in the secondstep, there will be many collisions, leading to redundant comparisons inthe fourth step. Hash tables that use a simple method in the secondstep, such as <code>absl::flat_hash_map</code>,<code>absl::node_hash_map</code>, <code>emhash::hash_map7</code> and<code>tsl::robin_map</code> require high-quality hash functions, so<code>std::hash</code> is not enough.<code>robin_hood::unordered_flat_map</code> goes slightly further in thesecond step but doesn't achieve optimal performance with<code>std::hash</code>. libc++'s implementation of<code>std::unordered_map</code> also shows poor performance when using<code>std::hash</code> and the number of elements is a power of 2.<code>robin_hood::hash</code>, though providing insufficient hashquality for most hash tables demanding good hash functions, is goodenough for <code>robin_hood::unordered_flat_map</code>, as illustratedin the range 100,000 to 3,100,000 data points.</p><p>In contrast, hash tables that do extra work in the second step canstill achieve good performance even with the simplest identity hash inthe first step, as seen in <code>ska::flat_hash_map</code>,<code>ska::bytell_hash_map</code>, <code>fph::DynamicFphMap</code>,<code>fph::MetaFphMap</code> and libstdc++'s<code>std::unordered_map</code>. I recommend checking out the innovativetrick <code>ska::flat_hash_map</code> uses in the second step, called <a href="https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/">FibonacciHashing: The Optimization that the World Forgot (or: a BetterAlternative to Integer Modulo)</a>. This method uses a 64-bitmultiplication and a right-shift instruction. Since there are noarithmetic instructions involved in the hash function (for identityhash), these are pretty much all the arithmetic instructions the CPU'sALU has to handle.</p><p>On the other hand, a hash table requiring a "good" hash function anda bitwise AND instruction for the second step lacks a hash function thatcan be implemented with a cost equal to or less than these twoinstructions, and still achieve satisfactory hash quality for most datadistributions, to the best of my knowledge.</p><p>For this reason, the combination of <code>ska::flat_hash_map</code>and <code>std::hash</code> currently ranks as the fastest when all thedata fits into the L2 cache on Intel Rocket Lake (or L1 cache on M1Max). Almost any other combination requires more instructions in thefirst and second steps combined. Within this data range, the<code>tsl::robin_map</code> with <code>absl::hash</code> is the secondfastest on Rocket Lake, with a very minor difference. On M1 Max, the<code>fph::DynamicFphMap</code> with <code>std::hash</code> combinationis almost just as fast, differing by less than 0.1 nanoseconds peroperation.</p><p>However, when the L2 cache can't hold all elements but the L3 cachecan, <code>ska::flat_hash_map</code> no longer holds the top spot. OnIntel Rocket Lake, the combination of <code>fph::DynamicFphMap</code>and <code>std::hash</code> proves to be the fastest within the range of8,192 to 3,100,000 elements (though <code>absl::flat_hash_map</code>with <code>absl::hash</code> is slightly faster at 400,000 elements). OnM1 Max, this combination also excels when the element count is within8,192 to 1,200,000. <code>fph::MetaFphMap</code> is the runner-up inthis range. Considering that both <code>ska::flat_hash_map</code> and<code>tsl::robin_map</code> use an aggressive expansion strategy leadingto a low load factor and earlier cache misses, and that<code>fph::DynamicFphMap</code> uses perfect hashing (i.e., noconflict), these results make sense.</p><p>When the data can't fit into the L3 cache,<code>ska::flat_hash_map</code> with <code>std::hash</code> becomes thefastest combination once more. The <code>tsl::robin_map</code> and<code>absl::hash</code> combination follows closely, with the differenceof one or two instructions becoming almost negligible compared to thecost of cache misses. These two are still the fastest even with amassive number of elements, partially because many other hash tables useauxiliary arrays to store additional information. For example,<code>absl::flat_hash_map</code> employs a metadata array which, whendealing with large data, can have a high and costly cache miss rate. Incontrast, <code>ska::flat_hash_map</code> and<code>tsl::robin_map</code> simply use one slot array to store key-valuedata, allowing one memory-to-cache line load to suffice.</p><p>Analyzing this chain of observations, we find that high-performancehash tables tend to have few failed comparisons in the fourth lookupstep (either <code>ska::flat_hash_map</code> limits failures, or<code>fph::DynamicFphMap</code> has no failures as a perfect hashtable).</p><p>Under this premise, when the data is small enough to be fully loadedinto the cache, the cost of loading from memory in the third stepbecomes minimal. During this stage, a small number and simplicity ofinstructions in the first and second steps are crucial for speed. A hashtable and hash function combination capable of this(<code>ska::flat_hash_map</code> and <code>std::hash</code>) is thefastest.</p><p>When the amount of data slightly increases, and collisions becomemore frequent, hash tables that avoid collisions can gain an advantage,such as <code>fph::DynamicFphMap</code>.<code>absl::flat_hash_map</code>, which uses SIMD to resolve collisionsto a certain degree, can also hold an advantage with certain datasizes.</p><p>When the amount of data continues to increase to the point where anymemory access results in a cache miss, the hash table with the leastmemory accesses becomes the fastest. This requires fewer hashcollisions, and also highlights that any metadata will increase thenumber of memory fetches at this stage. <code>ska::flat_hash_map</code>and <code>tsl::robin_map</code> are the fastest choices at thispoint.</p><p>In conclusion, a high-performance hash table and hash functioncombination will depend on several factors. When data fits entirelywithin cache, <code>ska::flat_hash_map</code> with<code>std::hash</code> performs best, due to the minimal instructionsrequired in the first two steps. As data size increases, hash tablesthat can effectively avoid collisions or utilize SIMD to resolve themgain the advantage, such as <code>fph::DynamicFphMap</code> and<code>absl::flat_hash_map</code>. Lastly, when data size surpasses cachecapacity, hash tables that minimize memory accesses, like<code>ska::flat_hash_map</code> and <code>tsl::robin_map</code>, comeout on top.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><p>When the size of <code>value_type</code> expands to 64 bytes, cachecan accommodate fewer elements. Additionally, a cache line can only holda single element, making hash table collisions more costly. However,modern CPUs' powerful prefetching capabilities typically keep thesecosts low for hash tables using linear or quadratic probing.</p><p>Overall, the performance between hash tables remains mostlyconsistent with the <code>&lt;uint64_t, uint64_t&gt;</code> case. Thecombination of <code>ska::flat_hash_map</code> and<code>std::hash</code> is fastest when data volume is low or high, whilethe combination of <code>fph::DynamicFphMap</code> and<code>std::hash</code> excels with medium-sized data.</p><h3 id="use-large-max_load_factor">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><p>Prior tests used default load factors. However, different hash tablesoften use different load factors, leading to performance differences dueto distinct memory size requirements.</p><p>Hash tables respond differently to increased maximum load factors.While larger load factors reduce memory usage and potential cache missrates, improving performance and minimizing footprint, they can alsoincrease hash table collisions, slowing lookup speed.</p><p>Thus, if hash table performance is sensitive to load factor andcollision probability, larger load factors can degrade performance, asseen with <code>ska::flat_hash_map</code> and<code>tsl::robin_map</code>. However, for tables less affected by loadfactor, such as <code>fph::DynamicFphMap</code> and<code>absl::flat_hash_map</code>, performance remains stable or evenimproves with larger load factors.</p><p>The comparison between larger and default<code>max_load_factor</code> confirms these observations.</p><p><code>ska::flat_hash_map</code> and <code>std::hash</code> remain thefastest combination with few elements. As element count rises,performance of <code>ska::flat_hash_map</code> and<code>tsl::robin_map</code> falls while <code>fph::DynamicFphMap</code>and <code>fph::MetaFphMap</code> performance generally increases. Otherhash tables exhibit little change. Thus, for larger data sizes,<code>fph::DynamicFphMap</code> and <code>std::hash</code> offer thefastest lookup, followed by combinations of <code>fph::MetaFphMap</code>and <code>std::hash</code>, and <code>absl::flat_hash_map</code> with<code>absl::hash</code>.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><p>When the <code>value_type</code> size is 64 bytes, increasing<code>max_load_factor</code> gives results similar to<code>&lt;uint64_t, uint64_t&gt;</code>. <code>ska::flat_hash_map</code>with <code>std::hash</code> is fastest for fewer elements, while<code>fph::DynamicFphMap</code> with <code>std::hash</code> takes thelead with more elements.</p><h2 id="lookup-keys-not-in-the-table-miss">Lookup keys not in the table(miss)</h2><h3 id="use-default-max_load_factor-1">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><p>Finding nonexistent elements in a hash table differs from locatingexisting ones. Full comparisons confirm whether the target key equals astored key, but different hash values are enough to prove that the keysare different. Therefore, hash tables that store hash values or partialhashes can speed up this task. In particular, when partial hash values(e.g., 1 byte) are used as metadata, they use less cache space thancomplete keys.</p><p>Although hash tables like <code>absl::flat_hash_map</code>,<code>r_h::unordered_flat_map</code>, and <code>fph::MetaFphMap</code>use partial hashes as metadata, they are not always fastest for smalllookups because the additional instruction cost can outweigh the cachesavings. However, the advantage of this approach appears once the L1cache is insufficient.</p><p>On Intel Rocket Lake, <code>fph::MetaFphMap</code> with<code>std::hash</code> is fastest when element count exceeds 6,000, andstays ahead almost the whole way up; only at the very largest counts(around 10 million) does <code>absl::flat_hash_map</code> with<code>absl::Hash</code> edge ahead. On M1 Max,<code>ska::flat_hash_map</code> is fastest for under 3,000 elements,closely followed by <code>fph::DynamicFphMap</code>. For 6,000 to150,000 elements, <code>fph::DynamicFphMap</code> leads, but<code>fph::MetaFphMap</code> prevails above 200,000 elements. The M1Max's larger cache capacity likely influences this variation, withmetadata-based methods gaining advantage at higher element counts.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-2">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><p>When using a 64-byte <code>value_type</code>, the overall situationis similar to <code>&lt;uint64_t, uint64_t&gt;</code>. On M1 Max,<code>fph::MetaFphMap</code> starts to dominate from 45,000 elements.This change is caused by the fact that the number of elements the cachecan hold becomes smaller.</p><h3 id="use-large-max_load_factor-1">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><p>When a larger maximum load factor is set, the overall ranking isconsistent with the default maximum load factor. The range where<code>ska::flat_hash_map</code> is ahead is reduced, because itsperformance is sensitive to load factor. On Intel Rocket Lake,<code>ska::flat_hash_map</code> with <code>std::hash</code> is thefastest when the number of elements is not greater than 1024. With moreelements, <code>fph::MetaFphMap</code> with <code>std::hash</code> isfaster. On M1 Max, the range where <code>ska::flat_hash_map</code> isahead is also smaller.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-3">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><p>As above, the overall relative speed relationship is similar to thatwhen using the default load factor.</p><h2 id="lookup-keys-with-a-50-probability-in-the-table">Lookup keys witha 50% probability in the table</h2><h3 id="use-default-max_load_factor-2">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-4">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><p>When the lookup key has a 50% probability of being in a hash table,the throughput decreases due to the increased lookup time caused by thebranch prediction failure penalty. The lookup time gap between hashtables narrows. We have to show only the fastest hash function for eachhash table by default to make the graph a bit clearer and easier toread.</p><p>From the above graphs, we can see that the performance of the hashtables can be considered very close, except for<code>std::unordered_map</code>.</p><p>When the number of elements is relatively small,<code>ska::flat_hash_map</code> with <code>std::hash</code> is still thefastest at most data points.</p><p>On Intel Rocket Lake, when the number of elements is between 25,000and 2,200,000, <code>fph::MetaFphMap</code> and<code>fph::DynamicFphMap</code> are the fastest when paired with<code>std::hash</code>. <code>absl::flat_hash_map</code> and<code>ska::flat_hash_map</code> are very close in performance on somedata points. When the number of elements is in the range of 2,200,000 to6,000,000, <code>absl::flat_hash_map</code> with <code>absl::hash</code>is the fastest pair. When the number of elements is not less than6,000,000, <code>ska::flat_hash_map</code> with <code>std::hash</code>returns to first place. The gap is very small, and the leading tablechanges with the element count.</p><p>On the M1 Max, <code>fph::MetaFphMap</code> and<code>absl::flat_hash_map</code> are the fastest hash tables when thenumber of elements is greater than 32,768. The performance of<code>absl::flat_hash_map</code> fluctuates with the number of elementswhile <code>fph::MetaFphMap</code> is relatively stable. When the numberof elements is extremely large, <code>ska::flat_hash_map</code> returnsto first place.</p><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-4">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><h3 id="use-large-max_load_factor-2">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-5">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-5">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h2 id="hit-appendix">Hit Appendix</h2><h3 id="use-default-max_load_factor-3">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_chart"></canvas></div></div><h3 id="use-large-max_load_factor-3">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_chart"></canvas></div></div><h2 id="miss-appendix">Miss Appendix</h2><h3 id="use-default-max_load_factor-4">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-2">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_chart"></canvas></div></div><h3 id="use-large-max_load_factor-4">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-3">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_chart"></canvas></div></div><h2 id="hit-appendix-1">50% Hit Appendix</h2><h3 id="use-default-max_load_factor-5">Use default max_load_factor</h3><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-4">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-4">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-4">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_chart"></canvas></div></div><h3 id="use-large-max_load_factor-5">Use large max_load_factor</h3><h4 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-5">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-5">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><h4 id="kv-uint64_t-uniformly-distributed-uint64_t-5">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_lookup_hit_default_load_factor_P50_latency_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_miss_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_miss_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_default_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});        create_chart_funcs.push(async() => {M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_50percent_hit_find_large_max_load_factor_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/int-lookup-throughput.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The integer key lookup throughput test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - String Erase and Insert</title>
    <link href="https://renzibei.com/en/string-erase-insert/"/>
    <id>https://renzibei.com/en/string-erase-insert/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.222Z</updated>
    
    <content type="html"><![CDATA[<p>The string key erase and insert test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/string-erase-insert.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we first construct a hash table with size N, and thenrepeat the following procedure M times:</p><ol type="1"><li>Insert a new element into the hash table</li><li>Randomly erase an element from the hash table.</li></ol><p>Because the table size stays at N (or N+1) throughout, this measuresthe steady- state cost of churning a fully-grown table rather than thecost of growing it. For string keys every insert and every erase stillhas to hash the whole key and compare strings on collision, and for keysabove the SSO threshold (the 24- and 64-byte variants) every insert alsoallocates the string's characters on the heap and every erase freesthem, so the 64-byte case is dominated by<code>malloc</code>/<code>free</code>. The 12-byte keys are storedinline (SSO) and skip the allocator entirely, which is why they areseveral times faster than the long-string cases.</p><p>Two structural effects matter here. First, open-addressing tables use<strong>tombstones</strong>: an erase marks the slot as deleted ratherthan empty so the probe chains stay intact, and once tombstonesaccumulate the table must rehash, which shows up as latency spikes.Second, the perfect-hash tables <code>fph::DynamicFphMap</code> and<code>fph::MetaFphMap</code> are a poor fit for this workload -- everyinsert can force a perfect-hash rebuild, so they are not only theslowest but actually time out at the larger sizes (those points read0.00 and are not plotted). They are fast to <em>look up</em> but pay forit heavily under churn, and that caveat matters for erase/insertworkloads.</p><h2 id="throughput">Throughput</h2><p>We record the time spent in the whole process, which includes bothinsert and erase operations.</p><p>The y axis value is the average time per operation. This result isobtained by<code>time/op = (time for insert + time for erase) / (2 * M)</code>.This is the average time taken for insert and erase.</p><h3 id="kv-string-with-a-fixed-length-of-64-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><p>For the 64-byte fixed key on the Xeon, <code>xxHash_xxh3</code> isthe best hash and the flat open-addressing tables are tightly bunched atthe front: <code>ska::flat_hash_map</code>, <code>tsl::robin_map</code>,<code>absl::flat_hash_map</code> and<code>robin_hood::unordered_flat_map</code> all run about 67-72 ns at1024 elements and converge to roughly 320-360 ns at 10^7, where the<code>malloc</code>/<code>free</code> of the 64-byte string and thecache-missing probes dominate. <code>std::unordered_map</code> is about50% slower (93 ns small, 513 ns at 10^7) because it allocates and freesa node on top of the string on every modification.</p><p>The perfect-hash tables are much slower:<code>fph::DynamicFphMap</code>/<code>fph::MetaFphMap</code> startaround 160-170 ns at 1024 and then <em>time out</em> --<code>fph::DynamicFphMap</code> has no plotted point past 32768 and<code>fph::MetaFphMap</code> shows a 6050 ns spike at 1.2M beforedropping out -- because the steady stream of inserts keeps triggeringperfect-hash rebuilds. Note also that<code>ankerl::unordered_dense_map</code> competes well at small and midsizes (~75 ns at 1024) but its 10^7 point is absent (0.00); itsdense-array design pays extra to keep entries packed when an arbitraryelement is erased, since it back-fills the hole from the array end.</p><p>On the M1 Max the ranking is similar, with<code>ska::flat_hash_map</code> and <code>tsl::robin_map</code> leading(~370-380 ns at 10^7) and the fph tables again timing out at largesizes. The smaller-key variants below preserve the ordering but run muchfaster: the 12-byte SSO key brings the flat leaders down to ~87-91 ns at10^7 (Xeon) because no allocation is involved.</p><h3 id="kv-string-with-a-max-length-of-64-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><h3 id="kv-string-with-a-fixed-length-of-24-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><h3 id="kv-string-with-a-max-length-of-24-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><h3 id="kv-string-with-a-fixed-length-of-12-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><h3 id="kv-string-with-a-max-length-of-12-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_erase_insert_time_chart"></canvas></div></div><h2 id="latency">Latency</h2><p>We record the latency of insert and erase operations separately. Theinsert latency here is different from that in the "Insert and Construct"test. The latency statistics in the construct test include alloperations from size 0 to size N, while in this test the size of thehash table is always N or N + 1.</p><h3 id="insert-after-erase">Insert (after erase)</h3><h4 id="kv-string-with-a-fixed-length-of-64-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><p>Latency is measured on the Xeon E-2388G only. The first charts inthis section show the 64-byte fixed key, but looking at the 12-byte SSOkey makes the table algorithm clearest, since allocation noise isremoved. For the P50 (median) insert-after-erase,<code>absl::flat_hash_map</code> and <code>absl::node_hash_map</code>are best at large sizes (~104-110 ns at 10^7), while<code>ska::flat_hash_map</code> and <code>tsl::robin_map</code> lead atsmall sizes (~14-16 ns at 1024) but lose ground in the 200k-1.2M range(~88-100 ns) as their probe chains lengthen. The fph tables sit farbehind even at the median (~50-68 ns at 1024, climbing into thehundreds) and drop out at large sizes.</p><p>The P99 (tail) is where the tombstone-rehash behaviour shows. Theconventional flat tables keep a bounded tail --<code>absl::flat_hash_map</code> is around 492 ns and<code>std::unordered_map</code> around 1000 ns at 10^7 -- but theperfect-hash tables have very large spikes:<code>fph::MetaFphMap</code>'s P99 reaches ~44000 ns at 1.2M and ~231000ns at 10^7, because an insert that triggers a full perfect-hash rebuildlands directly in the tail. These tables should never be chosen wherebounded modification latency matters.</p><p>For the 64-byte key the string allocation raises the floor: everytable's P99 is several hundred ns even at small sizes (~470 ns at 1024),and the same fph blowup (~48000 ns at 1.2M for<code>fph::MetaFphMap</code>) appears. Other length variants follow thesame pattern.</p><h4 id="kv-string-with-a-max-length-of-64-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-24-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-24-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-12-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-12-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_insert_after_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_insert_after_erase_P99_latency_chart"></canvas></div></div><h3 id="erase">Erase</h3><h4 id="kv-string-with-a-fixed-length-of-64-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><p>Erase latency is closer between tables because an erase on anopen-addressing table is cheap -- it locates the slot and drops atombstone -- with no allocation on the inline-stored path. For the64-byte key the P99 erase is bunched between about 1030 and 1140 ns at10^7 for the conventional tables (<code>absl::flat_hash_map</code> ~1030ns, <code>std::unordered_map</code> ~1390 ns), the floor again set bythe <code>free</code> of the string's heap buffer plus the cache miss toreach the slot. As before the fph tables are the exception:<code>fph::DynamicFphMap</code> already times out past 32768 and<code>fph::MetaFphMap</code>'s P99 climbs past ~1200 ns at 1.2M beforedropping out, since an erase that crosses the tombstone threshold forcesa perfect-hash rebuild. The median (P50) erase is dominated by<code>ska::flat_hash_map</code>, <code>tsl::robin_map</code> and<code>emhash::hash_map7</code> at small-to-mid sizes; the remainingstring variants tell the same story scaled by string length.</p><h4 id="kv-string-with-a-max-length-of-64-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-24-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-24-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-12-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-12-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_erase_P50_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_erase_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_erase_insert_time_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_insert_after_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_insert_after_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_insert_after_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_insert_after_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_insert_after_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_insert_after_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_insert_after_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_insert_after_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_insert_after_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_insert_after_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_insert_after_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_insert_after_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_erase_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_erase_P50_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_erase_P99_latency_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/string-erase-insert.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The string key erase and insert test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - String Iterate</title>
    <link href="https://renzibei.com/en/string-iterate/"/>
    <id>https://renzibei.com/en/string-iterate/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.223Z</updated>
    
    <content type="html"><![CDATA[<p>The string key iterate test.</p><span id="more"></span><html><p><link rel="preload" as="script" href="/en/assets/hashtable-bench/string-iterate.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></p></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we measure the performance of iterating over the hashtable.</p><p>Iteration is the one operation where the key type barely matters.Walking a table never hashes a key and never compares two strings; itonly advances an iterator and reads back each stored entry. Therefore,unlike lookup or insertion, the string length and the choice of hashfunction have almost no influence here. What dominates is how a tablelays out its entries in memory, exactly as in the <a href="/en/int-iterate/">integer iterate test</a>. The three storagestrategies are:</p><ul><li><strong>Dense array storage.</strong><code>ankerl::unordered_dense_map</code> and<code>emhash::hash_map7</code> keep all key-value pairs packed in acontiguous array and store only indices (or small metadata) in the hashslots. Iterating is a linear scan over a dense array, so the cost perelement is tiny and independent of the load factor.</li><li><strong>Inline open addressing.</strong><code>ska::flat_hash_map</code>, <code>ska::bytell_hash_map</code>,<code>tsl::robin_map</code>, <code>absl::flat_hash_map</code>,<code>fph::*</code> and <code>robin_hood::unordered_flat_map</code>store the entries directly in a sparse slot array. To iterate they walkthe whole slot array and skip the empty slots, so the work per elementgrows as the table becomes emptier and the array spills out ofcache.</li><li><strong>Node-based storage.</strong> <code>std::unordered_map</code>and <code>absl::node_hash_map</code> allocate each entry in a separatenode and chase pointers between them.</li></ul><p>One string-specific note: the stored entry here is<code>std::pair&lt;std::string, uint64_t&gt;</code>, which is the samefixed size (a 32-byte <code>std::string</code> control block plus thevalue) regardless of whether the string is 12, 24 or 64 characters long.The actual character bytes of a long, heap-allocated string liveelsewhere and are <em>not</em> touched during iteration, since we onlyadvance the iterator and do not read the string contents. That is whythe iterate numbers are essentially identical across all six stringvariants.</p><h2 id="throughput">Throughput</h2><h3 id="kv-string-with-a-fixed-length-of-64-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><p>The charts confirm the layout-driven picture. On both platforms<code>ankerl::unordered_dense_map</code> is the fastest by a wide marginand is perfectly flat across the whole size range -- about 0.22 ns perelement on the Xeon E-2388G and about 2.0 ns on the M1 Max -- because italways scans a packed array no matter how many slots the table has.<code>emhash::hash_map7</code> is the runner-up with the same flatbehaviour (about 0.63 ns on the Xeon and 2.6 ns on the M1).</p><p>Every inline open-addressing table behaves differently: itsper-element cost rises with the element count as the slot array growspast the cache. <code>ska::flat_hash_map</code>, for example, climbsfrom under 1 ns at 1024 elements to about 12.4 ns at 10^7 on the Xeon,because most of that time is then spent reading empty slots from memory.The node-based <code>std::unordered_map</code> is the slowest at largesizes -- around 48 ns per element at 10^7 on the Xeon and 25 ns on theM1 -- since iterating its node list becomes a stream of cache-missingpointer dereferences.</p><p>The perfect-hash tables sit in the middle of the open-addressing packand never lead here: <code>fph::MetaFphMap</code> runs about 7.4 ns and<code>fph::DynamicFphMap</code> about 10.8 ns at 10^7 on the Xeon. Aperfect hash buys fast <em>lookup</em>, but iteration still has to walka sparse slot array, so it gives fph no advantage and<code>fph::DynamicFphMap</code> is in fact one of the slower flat tablesto scan because it keeps a relatively sparse slot array.</p><p>The remaining five string variants below tell the same story almostto the decimal -- iteration ignores the key contents, so thefixed-vs-max length and the 12/24/64-byte distinction make no measurabledifference.</p><h3 id="kv-string-with-a-max-length-of-64-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><h3 id="kv-string-with-a-fixed-length-of-24-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><h3 id="kv-string-with-a-max-length-of-24-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><h3 id="kv-string-with-a-fixed-length-of-12-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><h3 id="kv-string-with-a-max-length-of-12-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_iterate_chart"></canvas></div></div><h2 id="latency">Latency</h2><p>The P99 latency of a single iteration step tells the same story fromthe tail end (latency is measured on the Xeon E-2388G only).<code>ankerl::unordered_dense_map</code> stays flat at about 0.94 nsregardless of size, because advancing over a dense array never missesfar. The inline and node-based tables develop growing tails as thebacking storage outgrows the cache: at 10^7 elements the P99 stepreaches roughly 84 ns for <code>ska::bytell_hash_map</code>, about 110ns for <code>ska::flat_hash_map</code> and about 103 ns for<code>tsl::robin_map</code>, while the node-based<code>std::unordered_map</code> and <code>absl::node_hash_map</code>reach hundreds of nanoseconds (about 406 ns and 444 ns respectively),with each spike corresponding to a cache miss on the next slot or node.The first point (N=1024) is a small outlier for <code>ankerl</code> at3.13 ns -- a cold-start artifact -- after which it settles to its flat0.94 ns. The other five string variants follow the same pattern.</p><h3 id="kv-string-with-a-fixed-length-of-64-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-string-with-a-max-length-of-64-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-string-with-a-fixed-length-of-24-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-string-with-a-max-length-of-24-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-string-with-a-fixed-length-of-12-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><h3 id="kv-string-with-a-max-length-of-12-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h3><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_iterate_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_iterate_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_iterate_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_iterate_P99_latency_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/string-iterate.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The string key iterate test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - String Insert and Construct</title>
    <link href="https://renzibei.com/en/string-insert-construct/"/>
    <id>https://renzibei.com/en/string-insert-construct/</id>
    <published>2026-06-13T13:55:00.000Z</published>
    <updated>2026-06-12T19:03:27.222Z</updated>
    
    <content type="html"><![CDATA[<p>The string key insert and construct test.</p><span id="more"></span><html><link rel="preload" as="script" href="/en/assets/hashtable-bench/string-insert-construct.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we measure the time spent constructing the hash table.The construction is done by the<code>insert( const value_type&amp; value )</code> operation. We testboth with and without calling reserve before inserting. The time spentin <code>reserve</code> is not counted in the total time.</p><p>During the test, a hash table is constructed multiple times, and inthe throughput test we record the total time spent on insert.</p><p>The length of a string is the same as<code>std::string::length()</code>, which means that one additional byteis needed to save the null character.</p><p>Inserting string keys is more expensive than inserting integersbecause each insert pays for three things that integers do not: hashingthe whole byte sequence of the key, comparing whole strings on acollision, and -- for keys longer than the Small String Optimization(SSO) threshold -- allocating heap memory to hold the characters. On thelibstdc++/libc++ implementations used here, a <code>std::string</code>of length up to 15 stores its bytes inline in the control block (noallocation), so the 12-byte keys are pure SSO and never touch theallocator, while the 24-byte and 64-byte keys spill to the heap andevery insert also pays a <code>malloc</code>. This single fact -- SSO vsheap allocation -- is the biggest driver of the differences between thestring-length variants below.</p><p>It is also worth setting expectations for the perfect-hash tables upfront. <code>fph::DynamicFphMap</code> and <code>fph::MetaFphMap</code>are extremely fast at <em>lookup</em> because they build a(near-)perfect hash with no probing, but that construction is exactlywhat makes their <em>insert</em> slow -- they periodically rebuild theperfect hash as the table grows. So in this insert/construct test thefph tables are consistently the slowest, and that caveat matters for theinsert results.</p><h2 id="throughput">Throughput</h2><h3 id="insert-with-reserve">Insert with reserve</h3><h4 id="kv-string-with-a-fixed-length-of-64-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><p>With <code>reserve</code> called ahead of time, the capacity is fixedbefore insertion, so this isolates the pure per-insert cost (hash,probe, allocate, store) with no rehashing. For the 64-byte fixed key onthe Xeon, <code>xxHash_xxh3</code> -- which is built to consume rawbytes -- is the best hash for nearly every table, and the leaders aretightly bunched: <code>absl::flat_hash_map</code>,<code>ankerl::unordered_dense_map</code> and<code>robin_hood::unordered_flat_map</code> all sit around 24-26 ns at1024 elements and converge to roughly 118-119 ns at 10^7, where theper-insert cost is dominated by the <code>malloc</code> of the64-character string and by cache misses, not by the table algorithm.<code>std::unordered_map</code> trails the flat tables by roughly 2x(about 48 ns small, 250 ns at 10^7) because it allocates a node perelement on top of the string allocation.</p><p>The fph tables are dramatically slower here:<code>fph::MetaFphMap</code> and <code>fph::DynamicFphMap</code> reachabout 2390 ns and 2471 ns at 10^7 on the Xeon -- on the order of 10xslower than <code>std::unordered_map</code> and 20x slower than the flatleaders -- because the perfect-hash build cost grows with the table.(Note their N=1024 points, ~198 and ~178 ns, are an artifact of buildingand rebuilding a perfect hash for a tiny table; they actually get<em>relatively</em> better at 32768 before the build cost dominatesagain.) On the M1 Max the ordering is the same but the gap is smaller:the fph tables land around 950 ns at 10^7 versus ~100-120 ns for theflat leaders.</p><p>The 24-byte and 12-byte variants below follow the same ranking, justshifted faster as the strings shrink: at 10^7 the flat leaders drop from~118 ns (64-byte) to ~92-112 ns (24-byte) to ~50-63 ns (12-byte), thelast because the 12-byte key is pure SSO and skips the allocatorentirely. The "max length N" variants run about the same as, or a littlefaster than, the matching "fixed length N", because the randomlygenerated keys average shorter than the maximum -- so there is less tohash, compare, and copy -- even though the varying length makes theper-key cost less uniform.</p><h4 id="kv-string-with-a-max-length-of-64-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-24-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-24-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-12-uint64_t">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-12-uint64_t">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><h3 id="insert-without-reserve">Insert without reserve</h3><h4 id="kv-string-with-a-fixed-length-of-64-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><p>Without <code>reserve</code>, every table also pays the cost ofgrowing and rehashing as it fills, which roughly doubles the per-inserttime across the board. For the 64-byte fixed key on the Xeon,<code>absl::flat_hash_map</code> with <code>xxHash_xxh3</code> is thefastest at large sizes (about 191 ns at 10^7) because it grows cheaplyand rehashes a metadata array rather than moving whole entries again;<code>ankerl::unordered_dense_map</code> is just behind (~178 ns) sinceit only has to grow a dense array. The probing-heavy tables suffer morefrom rehash: <code>ska::flat_hash_map</code> climbs to about 465 ns at10^7, more than double the absl number, because every growth re-insertsevery element through the probe sequence.</p><p>The fph tables are much slower here. Without a reserve, every growthtriggers a fresh perfect-hash build, so<code>fph::MetaFphMap</code>/<code>fph::DynamicFphMap</code> reach ~3800and ~4180 ns at 10^7 on the Xeon -- about 10x the flat leaders andnoticeably worse than their with-reserve numbers, where they at leastonly build the perfect hash once at the final size. This clearly showsthat fph trades construction speed for lookup speed.</p><p>The smaller-key and "max length" variants below preserve thisranking; the 12-byte SSO keys are again the fastest (absl about 64 ns at10^7) because they never call the allocator, leaving only the table'sown growth and probe costs.</p><h4 id="kv-string-with-a-max-length-of-64-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-24-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-24-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-12-uint64_t-1">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-12-uint64_t-1">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><h2 id="latency">Latency</h2><p>The P99 latency of a single insert is measured on the Xeon E-2388Gonly. It tells the same story from the tail end: the conventional flattables keep their tail bounded while the perfect-hash tables spikewhenever they rebuild.</p><h3 id="insert-with-reserve-1">Insert with reserve</h3><h4 id="kv-string-with-a-fixed-length-of-64-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><p>With a prior reserve, the 64-byte fixed key shows the flat tablesholding a tight P99: <code>absl::node_hash_map</code>,<code>ankerl::unordered_dense_map</code> and<code>absl::flat_hash_map</code> sit around 490-530 ns at 10^7, with<code>std::unordered_map</code> at ~880 ns. The allocation of the64-byte string sets a floor under all of them. The fph tables againstand out: <code>fph::DynamicFphMap</code> climbs to about 12660 ns and<code>fph::MetaFphMap</code> to about 12420 ns at 10^7 -- more than anorder of magnitude worse -- because even with capacity reserved, theperfect-hash construction has occasional very expensive steps thatdominate the 99th percentile. For the 12-byte SSO key the floor drops(the flat tables are around 460-490 ns at 10^7) since there is noallocation, but the fph tail stays high (<code>fph::DynamicFphMap</code>~11780 ns).</p><h4 id="kv-string-with-a-max-length-of-64-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-24-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-24-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-12-uint64_t-2">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-12-uint64_t-2">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h3 id="insert-without-reserve-1">Insert without reserve</h3><h4 id="kv-string-with-a-fixed-length-of-64-uint64_t-3">&lt;K,V&gt;:&lt;string with a fixed length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><p>Without a prior reserve the P99 of the conventional tables staysclose to the reserved case -- a rehash large enough to fall in the 99thpercentile is rare relative to the number of inserts -- so for the12-byte key <code>absl::flat_hash_map</code> still sits around 442 ns at10^7 and <code>std::unordered_map</code> around 850 ns. The fph tables,by contrast, degrade sharply: without reserve they rebuild the perfecthash at every growth, so <code>fph::DynamicFphMap</code> and<code>fph::MetaFphMap</code> reach roughly 20400 and 20360 ns at 10^7,almost double their reserved tails. If stable insert latency matters,reserve ahead of time and avoid the perfect-hash tables for the buildphase. The remaining string variants follow the same pattern.</p><h4 id="kv-string-with-a-max-length-of-64-uint64_t-3">&lt;K,V&gt;:&lt;string with a max length of 64, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-24-uint64_t-3">&lt;K,V&gt;:&lt;string with a fixed length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-24-uint64_t-3">&lt;K,V&gt;:&lt;string with a max length of 24, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-fixed-length-of-12-uint64_t-3">&lt;K,V&gt;:&lt;string with a fixed length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><h4 id="kv-string-with-a-max-length-of-12-uint64_t-3">&lt;K,V&gt;:&lt;string with a max length of 12, uint64_t&gt;</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    async function create_all_charts() {        return Promise.all(create_chart_funcs.map(fn => fn()));    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_insert_time_with_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_long_string_fix_64_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_long_string_max_64_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_fix_24_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_mid_string_max_24_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_fix_12_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {M1_Max_lb_small_string_max_12_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb___avg_insert_time_without_reserve_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_fix_64_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_long_string_max_64_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_fix_24_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_mid_string_max_24_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_fix_12_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create();});create_chart_funcs.push(async() => {Xeon_E_2388G_lb_small_string_max_12_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create();});    }    async function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    async function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/string-insert-construct.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The string key insert and construct test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Hash Table Benchmark - Integer Insert and Construct</title>
    <link href="https://renzibei.com/en/int-insert-construct/"/>
    <id>https://renzibei.com/en/int-insert-construct/</id>
    <published>2026-06-13T12:30:14.000Z</published>
    <updated>2026-06-12T19:03:27.220Z</updated>
    
    <content type="html"><![CDATA[<p>The integer key insert test.</p><span id="more"></span><html><p><link rel="preload" as="script" href="/en/assets/hashtable-bench/int-insert-construct.js"><link rel="preload" as="script" href="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js"><style> .chart-js-outer {width:100%; overflow-x: auto;} .chart-js-inner{height: 800px; width: 100%;} <span class="citation"data-cites="media">@media</span> screen and (max-width: 992px) {.chart-js-inner {height: 950px;} } <span class="citation"data-cites="media">@media</span> screen and (max-width: 576px) {.chart-js-inner {height: 1100px; width: 576px;} } </style></p></html><p><strong>Click the labels on the legend to hide or show the data linesfor specific hash tables and hash functions in the figure</strong>.</p><p>In this test, we measure the time spent constructing the hash table.The construction is done with the<code>emplace( const value_type&amp; value )</code> operation. We testboth with and without a reserve before inserting. The time taken by<code>reserve</code> is not counted in the total time.</p><p>We originally used the<code>insert( const value_type &amp;value)</code> function to build thehash table, but we found that some hash tables do not use<code>std::pair&lt;const Key, T&gt;</code> as their<code>value_type</code> (the type used by the STL container). As aresult, we cannot use the <code>insert</code> function uniformly acrossall the tables we test.</p><p>During the test, each hash table is constructed many times, and werecord the total insertion time in the throughput test.</p><h2 id="throughput">Throughput</h2><p>The y axis value is the average time per operation. This result isobtained by<code>time/op = sum{construct time} / (number of construct * number of elements)</code>.</p><h3 id="kv-uint64_t-with-several-split-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;<a name="throughput-split-u64-u64"></a></h3><h4 id="insert-with-reserve">Insert with reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><p>When analyzing insert performance after <code>reserve</code>, thefirst obvious thing is that some hash tables, such as<code>emhash::hash_map7</code>, <code>tsl::robin_map</code>,<code>absl::node_hash_map</code> and <code>absl::flat_hash_map</code>,are not suitable for use with <code>std::hash</code>. They need a betterhash function. Of course, there are some hash tables that haven'texposed this problem in the insert test yet, and they will show thisproblem in the later lookup test.</p><p>When we remove these hash tables that do not use the appropriate hashfunction (the reader can click the labels in the legend to hide the datapoints of these hash tables), we can see that the slowest hash tables toinsert are the perfect-hash <code>fph::DynamicFphMap</code> and<code>fph::MetaFphMap</code>, by a large margin. This is the honesttradeoff for their fast lookups: building a perfect hash is expensive,and the cost grows with the number of elements. On the Xeon E-2388G, thefph tables already take 37 to 51 ns per insert at 1024 elements (vsabout 20 ns for <code>std::unordered_map</code> and 4 to 6 ns for thefastest flat tables), so even with a small table the gap is real. As theelement count grows the gap widens further: at 10^7 elements the fphinsert time is about 1450 to 1900 ns, roughly 12 to 15 times that of<code>std::unordered_map</code> (which sits near 125 ns) on the IntelCPU, and about 11 to 12 times on the Apple M1 Max (where the fph tablesare near 1550 ns and <code>std::unordered_map</code> near 134 ns).</p><p>In addition, we unexpectedly found that the combination of<code>std::unordered_map</code> and <code>std::hash</code> issurprisingly slow for some data points on the M1 Max. Furtherobservation shows that the number of elements of these data points isexactly a power of two: at 1024 elements this combination needs about165 ns per insert, at 2048 about 311 ns, and at 8192 it explodes toroughly 2500 ns, while at the neighboring non-power-of-two sizes (e.g.800, 1500, 3000, 6000) it stays around 20 to 32 ns. We infer this comesfrom the libc++ <code>unordered_map</code> implementation used by clangon the M1: it takes the hash value modulo the number of buckets toobtain a slot index. When the bucket count is exactly a power of two,that modulo degenerates into keeping only the low-order bits of the hashand discarding the high-order bits. Since <code>std::hash</code> forintegers is the identity function, any entropy that lives in the highbits is thrown away, which produces heavy collisions at exactly thepower-of-two sizes.</p><p>When looking for the fastest hash table for insert operations, theresults are a little different on the x86-64 and arm64architectures.</p><p>On Intel Rocket Lake, several flat tables are essentially tied atsmall sizes: <code>emhash::hash_map7</code> with<code>absl::Hash</code>, <code>ska::flat_hash_map</code> with<code>std::hash</code>, and <code>ankerl::unordered_dense_map</code> allsit around 4 to 5 ns per insert when the element count is no larger thana few thousand. In the medium range, <code>absl::flat_hash_map</code>with <code>absl::Hash</code> is the most consistent winner: it staysnear 6 to 12 ns from 6,000 up to about 800,000, where the other flattables start to climb faster. At the largest sizes the picture shiftsagain, and in an irregular way. <code>tsl::robin_map</code> with<code>robin_hood::hash</code> is actually the fastest at 10^7 elements(about 25 ns), with <code>ankerl::unordered_dense_map</code> and<code>ska::flat_hash_map</code> close behind (about 27 to 28 ns). Yetthe very same combination slumps in the 200,000 to 1,200,000 range(rising to about 30 to 42 ns) before recovering — a non-monotonicprofile that suggests <code>robin_hood::hash</code> produces a weakerdistribution for these masked-bit keys in that range rather than a cleancache-tier effect. <code>robin_hood::unordered_flat_map</code>, on theother hand, does not suffer from this hash-quality problem: it stayscompetitive with either <code>absl::Hash</code> or<code>robin_hood::hash</code>. This means that though it requires a goodhash function, it does not rely on the hash quality as much as otherhash tables like <code>absl::flat_hash_map</code>.</p><p>On Apple M1 Max, the story is a little different. When the elementcount is no larger than about 6,000, <code>ska::flat_hash_map</code>with <code>std::hash</code> is the fastest, dipping below 4 ns perinsert. When the number of elements is larger than 6,000,<code>absl::flat_hash_map</code> with <code>absl::Hash</code> takes thelead, staying around 4.5 to 10 ns all the way to about 1,200,000 whilethe others trail. But when the number of elements approaches 10^7, thegap between these flat maps shrinks and they converge to roughly 20 to22 ns. This is because the working set no longer fits in cache whenthere are many elements. This phenomenon can be observed more clearly inthe <a href="#throughput-split-u64-56b">&lt;K,V&gt;: &lt;uint64_t withseveral split bits masked, 56 bytes struct&gt;</a> data because cache isunder more pressure in that situation.</p><h4 id="insert-without-reserve">Insert without reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><p>When there is no <code>reserve</code> operation before insert, therankings of these hash tables change again.</p><p>On Intel Rocket Lake, <code>absl::flat_hash_map</code> with<code>absl::Hash</code> is almost always the fastest across all datascales (about 16 to 44 ns), with <code>emhash::hash_map7</code> and<code>ankerl::unordered_dense_map</code> close behind. When the numberof elements is small, <code>tsl::robin_map</code> and<code>ska::flat_hash_map</code> are also nearly as fast, but they fallback in the larger ranges because their aggressive growth strategytriggers more rehashing.</p><p>On Apple Silicon, <code>ska::flat_hash_map</code> with<code>std::hash</code> is the fastest at the smallest sizes (about 14 nsat 1024). When the number of elements grows beyond a few thousand,<code>absl::flat_hash_map</code> with <code>absl::Hash</code> againbecomes the fastest, staying around 20 to 32 ns across the rest of therange. <code>emhash::hash_map7</code> with <code>absl::Hash</code> comesa close second.</p><h3 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;<a name="throughput-split-u64-56b"></a></h3><h4 id="insert-with-reserve-1">Insert with reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><p>The pattern of keys is the same as the <a href="#throughput-split-u64-u64">&lt;K,V&gt;: &lt;uint64_t with severalsplit bits masked, uint64_t&gt;</a> . The difference is that the size ofthe pair is 4 times larger: 64 bytes per element instead of the original16 bytes. Therefore, the working set runs out of cache earlier.</p><p>On Intel Rocket Lake, at small sizes<code>ankerl::unordered_dense_map</code>,<code>ska::flat_hash_map</code> with <code>std::hash</code> and<code>absl::flat_hash_map</code> with <code>absl::Hash</code> are allclose (about 5 to 8 ns). In the medium range,<code>absl::flat_hash_map</code> with <code>absl::Hash</code> is thefastest, staying near 10 to 15 ns up to about 200,000 elements. But atthe largest sizes <code>absl::flat_hash_map</code> falls behind: at 10^7it costs about 65 ns while <code>ska::flat_hash_map</code> (about 43 ns)and <code>tsl::robin_map</code> with <code>absl::Hash</code> (about 40ns) are clearly faster. This is because <code>absl::flat_hash_map</code>keeps its metadata and slots in two separate arrays, so once the workingset no longer fits in cache it pays two memory loads per probe; thelarger 64-byte element makes this penalty more visible.</p><p>On Apple Silicon, <code>ska::flat_hash_map</code> with<code>std::hash</code> is the fastest when the number of elements issmall (no larger than about 6,000). For the medium-to-large range,<code>absl::flat_hash_map</code> with <code>absl::Hash</code> takes thelead, holding it up to about 1,200,000 elements. When the element countis higher than that, <code>ska::flat_hash_map</code> again pulls roughlyeven with or ahead of <code>absl::flat_hash_map</code>. This is becausewhen the data reaches this scale, the M1 Max runs out of cache, and<code>absl::flat_hash_map</code> relies on cache to get decentperformance. Its metadata and slots are in two different arrays, while<code>ska::flat_hash_map</code> has only one slot array. So even ifthere is a cache miss, <code>ska::flat_hash_map</code> only fetches datafrom RAM once.</p><h4 id="insert-without-reserve-1">Insert without reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><p>When there is no prior <code>reserve</code>, the results are quitesimilar to those of the data <code>&lt;uint64_t, uint64_t&gt;</code>.This may be because most of the time is spent on allocation anddeallocation of memory.</p><p>There is one difference: <code>absl::node_hash_map</code> gets abetter ranking than in the <code>&lt;uint64_t, uint64_t&gt;</code> data.This shows that when the <code>sizeof(value_type)</code> is large, orwhen the construction of the <code>value_type</code> is time-consuming,<code>absl::node_hash_map</code> has some advantages because it does notmove the value between slots.</p><h2 id="latency">Latency</h2><h3 id="kv-uint64_t-with-several-split-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, uint64_t&gt;</h3><h4 id="insert-with-reserve-latency">Insert with reserve latency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="insert-without-reserve-latency">Insert without reservelatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><p>We found that even if there is no reserve before inserting, the P99latency of the insert operation is basically at the same level as theP99 latency with reserve. But theoretically, without reserve, theworst-case time complexity of an insert operation is O(n), because thehash table needs to grow. If a reservation is made in advance, rehashcan be avoided when inserting.</p><p>Therefore, we can additionally look at the P100 latency (aka maxlatency) of insert operations. The following two graphs show theinsertion P100 latency when the reserve operation is performed inadvance and the insertion P100 latency when the reserve operation is notperformed.</p><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P100_latency_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P100_latency_chart"></canvas></div></div><p>It can be seen that only the fph family of hash tables and<code>robin_hood::unordered_flat_map</code> have large maximum insertionlatency when a prior reserve is performed. The fph tables reach tens ofmillions of nanoseconds at 10^7 elements (their perfect-hash rebuild isthe culprit), and <code>robin_hood::unordered_flat_map</code> climbs toover a million ns. From the experimental data, the worst-case timecomplexity of insertion in the other hash tables, after reserve, doesnot appear to be O(n). The P100 insertion latency of<code>ska::flat_hash_map</code> and <code>ska::bytell_hash_map</code> isonly about 760 to 830 ns even when the number of elements is 10^7, whichis a good result.</p><p>When there is no prior reserve, the P100 latency of the insertoperation arguably reflects the worst time complexity of that operationon the hash table: O(n). The maximum insertion latency is proportionalto the number of elements, reaching the order of 10^8 ns at 10^7elements for the flat tables. Relatively speaking,<code>absl::flat_hash_map</code> and<code>robin_hood::unordered_flat_map</code> (together with<code>ankerl::unordered_dense_map</code>) have a smaller maximum latencywhen expanding.</p><p>If the user wants stable insert time, then reserve should beperformed in advance, which can avoid rehash on most hash tables.</p><h3 id="kv-uint64_t-with-several-split-bits-masked-56-bytes-struct-1">&lt;K,V&gt;:&lt;uint64_t with several split bits masked, 56 bytes struct&gt;</h3><h4 id="insert-with-reserve-latency-1">Insert with reserve latency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="insert-without-reserve-latency-1">Insert without reservelatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><p>When the size of <code>value_type</code> is 64 bytes, the rankingsare very similar to <code>&lt;uint64_t, uint64_t&gt;</code>.</p><h2 id="throughput-appendix">Throughput Appendix</h2><h3 id="kv-uint64_t-with-high-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h3><h4 id="insert-with-reserve-2">Insert with reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><h4 id="insert-without-reserve-2">Insert without reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><h3 id="kv-uint64_t-with-low-position-bits-masked-uint64_t">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h3><h4 id="insert-with-reserve-3">Insert with reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><h4 id="insert-without-reserve-3">Insert without reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><h3 id="kv-uint64_t-uniformly-distributed-uint64_t">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h3><h4 id="insert-with-reserve-4">Insert with reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_chart"></canvas></div></div><h4 id="insert-without-reserve-4">Insert without reserve</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_chart"></canvas></div></div><h2 id="latency-appendix">Latency Appendix</h2><h3 id="kv-uint64_t-with-high-position-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with high position bits masked, uint64_t&gt;</h3><h4 id="insert-with-reserve-latency-2">Insert with reserve latency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="insert-without-reserve-latency-2">Insert without reservelatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><h3 id="kv-uint64_t-with-low-position-bits-masked-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t with low position bits masked, uint64_t&gt;</h3><h4 id="insert-with-reserve-latency-3">Insert with reserve latency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="insert-without-reserve-latency-3">Insert without reservelatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><h3 id="kv-uint64_t-uniformly-distributed-uint64_t-1">&lt;K,V&gt;:&lt;uint64_t uniformly distributed, uint64_t&gt;</h3><h4 id="insert-with-reserve-latency-4">Insert with reserve latency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P99_latency_chart"></canvas></div></div><h4 id="insert-without-reserve-latency-4">Insert without reservelatency</h4><div class="chart-js-outer"><div class="chart-js-inner"><canvas id="Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P99_latency_chart"></canvas></div></div><html><script>    var create_chart_funcs = [];    var chart_js_point_r = 6;    if (window.innerWidth < 576) {        chart_js_point_r = 5;    }    function create_all_charts() {        for (var i = 0; i < create_chart_funcs.length; i++) {            create_chart_funcs[i]();        }    };    var bench_results_ready = false; var chart_js_ready = false;    function add_new_chart_callbacks() {        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create);                create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P100_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P100_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_construct_with_reserve_P99_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb__co_construct_no_reserve_P99_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_high_bits_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_mask_low_bits_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_construct_with_reserve_P99_latency_create);        create_chart_funcs.push(Xeon_E_2388G_lb_uniform_uint64_t_co_uint64_t_rb__co_construct_no_reserve_P99_latency_create);        create_chart_funcs.push(M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(M1_Max_lb_mask_split_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(M1_Max_lb_mask_split_bits_uint64_t_co_56bytes_payload_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(M1_Max_lb_mask_high_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(M1_Max_lb_mask_low_bits_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_create);        create_chart_funcs.push(M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_insert_time_with_reserve_create);        create_chart_funcs.push(M1_Max_lb_uniform_uint64_t_co_uint64_t_rb___avg_insert_time_without_reserve_create);    }    function bench_results_loaded() {        add_new_chart_callbacks();        bench_results_ready = true;        if (chart_js_ready) {            create_all_charts();        }    };    function chart_js_script_loaded() {        chart_js_ready = true;        if (bench_results_ready) {            create_all_charts();        }    };</script><script src="/en/assets/hashtable-bench/int-insert-construct.js" onload="bench_results_loaded();"> </script><script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.8.0/chart.min.js" onload="chart_js_script_loaded();"></script></html><hr><p><a href="/en/hashtable-bench/#posts">← Back to Hash Table Benchmarkindex</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;The integer key insert test.&lt;/p&gt;</summary>
    
    
    
    <category term="algorithm" scheme="https://renzibei.com/en/categories/algorithm/"/>
    
    
    <category term="hashtable" scheme="https://renzibei.com/en/tags/hashtable/"/>
    
    <category term="benchmark" scheme="https://renzibei.com/en/tags/benchmark/"/>
    
    <category term="algorithm" scheme="https://renzibei.com/en/tags/algorithm/"/>
    
  </entry>
  
</feed>
