Hash Table Benchmark

Before viewing the benchmark results
Methodology
Experiments and Results
1. Testing Platform
2. Results
Conclusion
Restrictions
Postscript

This is yet another benchmark to compare different hash tables (hash maps) with different hash functions in C++, attempting to evaluate the performance of the lookup, insertion, deletion, iteration, etc. on different datasets as comprehensively as possible.

We show the performance of hash tables with hash functions for different operations on different types of datasets of different sizes. The reader can refer to these results and choose the hash table and hash function that best match the target application.

The benchmarks were collected in 2022–2023 (machine configurations are listed below); this writeup was compiled and published in 2026.

Before viewing the benchmark results

Anyone familiar enough with hash tables knows that even a well-known and widely used hash table can have a data distribution it is not very good at. In other words, no hash table is the fastest on all datasets for all operations.

The best practice for selecting a hash table is to consider the data characteristics, operation mix, requirements, and hash function together.

This benchmark tries to use a concise and effective method to test the performance of different operations of the hash tables on some of the most common data distributions. But there will always be data distributions that are very different from the data we use for testing, and different users have different requirements for different indicators. Therefore, the best test is still one in the real application.

Methodology

Test Items

We measure combinations of different hash tables with different hash functions. For each combination, we measured its insert, delete, lookup (including successful and failed lookups), and iteration performance under different data. Below is a more detailed table of test items. Please note that in the following we will use "hash table" or "hash map" interchangeably to refer to the same concept.

Index	Test items	Notes
1	Insert with reserve	Call map.reserve(n) before insert n elements
2	Insert without reserve	Insert n elements without prior reserve
3	Erase and insert	Repeatedly do one erase after one insert, keep the map size constant
4	Look up keys in the map (hit)	Repeatedly look up the elements that are in the map
5	Look up keys that are not in the map (miss)	Repeatedly look up the elements that are not in the map
6	Look up keys with 50% probability in the map	Repeatedly look up the elements that have a 50% probability in the map
7	Look up keys in the map with large max_load_factor (hit)	Same as Test Item 4 except that the map is set a max_load_factor of 0.9 and rehashed before the lookup operations
8	Look up keys that are not in the map with large max_load_factor (miss)	Same as Test Item 5 except that the map is set a max_load_factor of 0.9 and rehashed before the lookup operations
9	Look up keys with 50% probability in the map with large max_load_factor	Same as Test Item 6 except that the map is set a max_load_factor of 0.9 and rehashed before the lookup operations
10	Iterate the table	Iterate the whole table several times
11	Heap memory size and load factor with default and large max_load_factor	Record the heap memory size and load factor when constructing the map in Test Items 4 and 7

As may be noticed, several test items measure lookup speed with a larger upper limit on the load factor. The load factor measures how full the hash table is, and max_load_factor is the STL API for controlling its upper bound. This is because each hash table may have a different expansion strategy and max_load_factor, so even with the same number of elements, different tables may choose different load factors and occupy different amounts of memory. The load factor and memory footprint greatly affect lookup performance, so using a hash table with a smaller max_load_factor may have worse (or better) lookup performance. On the other hand, a higher load factor may lead to a higher probability of collision, thus reducing lookup performance.

In addition, extreme lookup performance usually requires making the space used by the hash table as small as possible to reduce the cache miss rate. When available memory is very limited, a larger load factor may also be preferred. One way to do this is to set a higher max_load_factor, and then rehash (or set a large max_load_factor before the main construction process of the table).

For each of the tests above, we tested the throughput and latency (when the platform under test meets the conditions for the latency test). The throughput results will be more representative, because modern software runs on CPUs with pipelined architectures. And almost all operations will have other instructions before and after them, which can make full use of the pipeline. However, for some specific uses, latency data is important. The latency measurement results here are for reference only for special needs and have relatively large limitations.

Dataset

All the data used in the benchmark are randomly generated; the user can choose different seeds for the test data. We tested the performance of each hash table at different sizes from 32 to 10^7.

The tested keys consist of 64-bit integers of different distributions and strings of different lengths. The detailed test data is shown in the table below.

Index	Key Type	Value Type	Notes
1	uint64_t with several split bits masked	uint64_t	The keys have such characteristics: only bits in some positions may be 1, and all other bits are 0. For test data of size n, at most ceil[log2(n)] fixed bits may be 1. e.g. If the key type is uint8_t (it is uint64_t in reality) and the test size is 7, the keys will be generated with the method `rng() & 0b10010001`. The distribution characteristics of such bits can relatively comprehensively examine whether hash tables and hash functions can handle keys that only have effective information in specific bit positions.
2	uint64_t, uniformly distributed in [0, UINT64_MAX]	uint64_t	The keys follow a uniform distribution in the range [0, UINT64_MAX].
3	uint64_t, bits in high position are masked out	uint64_t	The bits in the high position are set to 0. For test data of size n, at most ceil[log2(n)] fixed bits may be 1. For example, if the key type is uint8_t (uint64_t in reality) and the test size is 7, the keys will be generated with the method `rng() & 0b00000111`
4	uint64_t, bits in low position are masked out	uint64_t	The bits in the low position are set to 0. For test data of size n, at most ceil[log2(n)] fixed bits may be 1. For example, if the key type is uint8_t (uint64_t in reality) and the test size is 7, the keys will be generated with the method `rng() & 0b11100000`
5	uint64_t with several bits masked	56 bytes struct	The keys are the same as the distribution of the data 1. The payload is a 56 bytes long struct, which makes the `sizeof(std::pair<key, value>)==64`
6	Small string with a max length of 12	uint64_t	The key type is a string with a maximum length of 12. Both length and characters are randomly generated. The compiler/library may use Small String Optimization (SSO).
7	Small string with a fixed length of 12	uint64_t	The key type is a string with a fixed length of 12. The characters are randomly generated. The compiler/library may use Small String Optimization (SSO).
8	Mid string with a max length of 24	uint64_t	The key type is a string with a maximum length of 24. Both length and characters are randomly generated.
9	Mid string with a fixed length of 24	uint64_t	The key type is a string with a fixed length of 24. The characters are randomly generated.
10	Large string with a max length of 64	uint64_t	The key type is a string with a maximum length of 64. Both length and characters are randomly generated.
11	Large string with a fixed length of 64	uint64_t	The key type is a string with a fixed length of 64. The characters are randomly generated.

Different distributions within the range representable by uint64_t are chosen as keys. Uniformly distributed integers in the range of uint64_t are the easiest to generate with pseudo-random numbers, but they are rare in real situations.

If users are concerned with performance using integer keys, we strongly recommend focusing on the results on the first dataset rather than the second dataset (i.e. the dataset with a uniform random distribution). The data from the first dataset can better examine the ability of hash tables and hash functions to deal with more diverse patterns, while the test on uniform random distributions barely verifies the ability of hash tables to handle other distributions. Moreover, in real data distributions, few keys happen to be uniformly randomly distributed over the [0, 2^63 - 1] range.

With this in mind, our analysis for integer keys focuses mainly on the first dataset. To keep the articles shorter and easier to read, the other integer datasets — including the second (uniform random) one and the high-/low-bit-masked ones — are mostly shown only in the appendix of each test, without detailed discussion.

For the string datasets, different character sets are used. For fixed-length strings, the pattern is like the first dataset, where several split bits are masked. In other words, only the bits in some positions may differ among these datasets. This pattern is intended to test the quality of the hash function.

For the strings with variable length, a subset of the printable characters can appear in the string.

Real data distributions are often biased. If a combination of hash function and hash table can only handle one distribution but cannot handle other distributions, this combination is not robust to unknown distributions. If the distribution of the data is known in advance, the user can pick the fastest and most stable hash table for that data.

Tested Hash functions and Hash maps

Below is the list of hash functions we tested.

Name	Type	Notes	Link
std::hash	Normal	Implemented by compiler; identity hash is used for integer type in libc++ and libstdc++
absl::hash	Normal	Implemented by Google; Uses 128-bit product of multiplication and an xor-shift.	https://github.com/abseil/abseil-cpp
robin_hood::hash	Normal	For integer keys, it uses xor-shift, multiplication, xor-shift; For string keys it is similar to absl::hash	https://github.com/martinus/robin-hood-hashing
xxHash_xxh3	Bytes	Designed for string; We use identity hash for integer type to pass compilation; It won't show in the results of integer keys	https://github.com/Cyan4973/xxHash

Originally we had some seed hash functions in the tests, which are hash functions that take both a key and a seed as arguments. We removed these hash functions to keep the test subjects simple, and we use the no-seed version of all the hash tables.

We will not show the results of hash xxHash_xxh3 in tests on integer keys. For the early versions of absl::Hash, the behavior on the arm64 platform was different from that on the x86-64 platform, and it was poor for some datasets. So we once had a uint128_mul::hash to compare with it, which is similar to the absl::Hash on the x86-64 platform. Since the newest version of absl::Hash has fixed this problem, we deleted the uint128_mul::hash.

The following table lists the hash tables we tested. Some of these hash tables rely on a "good" hash function to work properly, which can generate hash values that are as uniformly distributed as possible for unbalanced keys. If a hash function that does not have such a property (e.g. identity hash) is used, then the performance of these hash tables may drop drastically. These hash tables may assume that the hash values of the keys from the dataset are uniformly distributed in the output range. This requires hash functions to have properties like uniformity or diffusion.

The implication here is that a "good" hash function tends to be more complex than the simplest hash function (the identity hash), requiring more instructions to complete the computation. Some hash tables do not rely on a good hash function, perhaps because they do some extra work to improve the uniformity of the hash values. For such a hash table, the simpler the hash function, the better, preferably the identity hash. So we should always compare the combinations of hash tables and hash functions, rather than fixing the hash function to compare the hash table, or vice versa.

Here are the hash maps we tested.

Name	Requires a good hash function	Notes	Link
std::unordered_map	No*	Implemented by the STL library; May differ in libc++ and libstdc++.
ska::flat_hash_map	No	Very fast and simple; Uses robin hood hash; Memory overhead: alignof(value_type) per element; Requires small load factor	https://github.com/skarupke/flat_hash_map
ska::bytell_hash_map	No	A little slower than ska::flat_hash_map but one byte per element memory overhead	https://github.com/skarupke/flat_hash_map
absl::flat_hash_map	Yes	Uses SIMD and metadata; Fast when looking up keys that are not in the map; One byte per element memory overhead	https://abseil.io/about/design/swisstables
absl::node_hash_map	Yes	Slower than absl::flat_hash_map but does not invalidate the pointer after rehash	https://github.com/abseil/abseil-cpp
tsl::robin_map	Yes	A fast hash table using robin hood hash; Memory overhead is no less than ska::flat_hash_map	https://github.com/Tessil/robin-map
emhash::HashMap7	Yes	Fast in lookup hit operations.	https://github.com/ktprime/emhash
fph::DynamicFphMap	No	A dynamic perfect hash table; Ultra-fast in lookup but slow in insert; 2~8 bits per element memory overhead	https://github.com/renzibei/fph-table
fph::MetaFphMap	No	A dynamic perfect hash table using metadata; Better than fph::DynamicFphMap in the miss lookup case.	https://github.com/renzibei/fph-table
robin_hood::unordered_flat_map	Yes	A table using robin hood hash;	https://github.com/martinus/robin-hood-hashing
ankerl::unordered_dense_map	Yes	Stores entries in a dense array; fastest to iterate; compact footprint	https://github.com/martinus/unordered_dense

* Note: For the tested libc++ and libstdc++ versions, the libc++ implementation requires a good hash function but libstdc++ has no such requirement. If using std::hash, the performance can be poor when the size is the power of 2 for libc++.

At a quick glance, it is easy to see that many of the hash tables listed use the robin hood hashing technique in the pursuit of speed.

Experiments and Results

The code of this benchmark is available at https://github.com/renzibei/hashtable-bench. The raw per-combination CSV data behind every chart lives in that repository under results/ — chiefly results/E2388G-202305-native/csv (Xeon E-2388G) and results/m1-max-202305/csv (M1 Max).

Testing Platform

Platform 1: Intel Xeon E-2388G CPU @ 3.20 GHz, boost to 5.1 GHz; x86-64; Rocket Lake.

Platform 2: M1 Max Macbook Pro 16 inch, 2021; arm64; Firestorm.

Due to the lack of a high-precision time stamp counter on the arm64 (M1 Max) platform, we only measured the latency on the x86-64 platform (even the AMD CPU has some problems when measuring the latency using the TSC, so we only test latency on the Intel CPU). In addition, for the x86-64 platform, we have also taken some measures to ensure the stability of the measurement results, including the following:

Use the taskset command to set CPU core affinity
Turn off hyperthreading
Isolate the cores by adding isolcpus= and rcu_nocbs= in GRUB_CMDLINE_LINUX in /etc/default/grub
Turn off some power-saving options, including disabling the ondemand systemd service, and setting idle=poll and intel_idle.max_cstate=0 in the grub command line.
Turn off timer tick interrupts, recompile the kernel with CONFIG_NO_HZ_FULL=y and set nohz_full= in the grub command line.
Other adjustments that de-jitter the system latency. You can refer to https://rigtorp.se/low-latency-guide/.

These measures cannot be done on macOS. But as we do not measure the latency of operations on macOS, it doesn't matter that much.

Results

For throughput data, performance will be represented by the average time per operation. We will plot the average time per operation for different scales of data. The shorter the time, the better the performance.

For the latency data, due to the limitation of the article length, we only show the latency of the 99th percentile in most test cases, which can help to show the worst time complexity and long tail latency of the hash table. And that's not even enough to reflect worst-case latency. For a distribution with long-tailed features, the 0.99th, 0.999th, and 0.9999th quantiles can all have very different values. If the application has strict requirements on real-time performance and tail latency (such as gaming and high-frequency trading), then this data metric should be worth paying attention to.

If too much time is spent in a test, we will count it as the timeout and set the time as zero, and that data point won't be plotted.

You can click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.

We divide the results into different groups according to data type and operation type.

Conclusion

In short, no single hash table is best for every workload — the right pick depends on which operations dominate, what the keys look like, and how much memory is available. If lookups dominate and the table is built once, the perfect-hash fph::DynamicFphMap (and fph::MetaFphMap when misses are common) is hard to beat, at the cost of slow construction. For a general-purpose map with a mix of inserts and lookups, absl::flat_hash_map with absl::Hash is a fast, compact and robust default; ska::flat_hash_map is the quickest while the data stays in cache, and ankerl::unordered_dense_map is the one to reach for when iteration is frequent. std::unordered_map is the slowest in most tests and is worth keeping mainly for its pointer-stability guarantees.

For the full reasoning behind these picks, with a per-test and per-workload comparison, see the Analysis & Conclusion.

Restrictions

Exclusive access to resources

In our tests, almost all computer resources can be monopolized by the test program, especially cache resources. And this is relatively rare in practical applications. In fact, other processes and tasks may occupy part of the cache. In practical applications users should expect a lower available cache size.

Cold memory and warm cache

We neither did a warmup, nor did we specifically test the cold start scenario. In our tests, we repeatedly test an operation with a range of data many times. Therefore, when the number of operations is much greater than the amount of data, it can be considered that most operations are accessing the warm cache. When the number of operations is less than the number of data, most operations are accessing cold memory. In our test, limited by the test time, when the amount of data is small, the number of operations will be much greater than the amount of data; and when the amount of data is large, the number of operations will be equal to the amount of data.

Huge test space

The size of the test space contains at least

1	\|hash table set\| x \|hash function set\| x \|data sets\| x \|operation set\| x \|hardware platform set\| x \|compiler set\|

As can be seen, the testable space is quite huge. Any addition to the set of hash tables or the set of hash functions will greatly increase the testing effort. Due to time and resource constraints, we have only explored part of the combinations, and there are still many combinations and spaces that we have not tested.

Therefore, in order to choose the most suitable hash table and hash function for the user's purpose, real tests should be carried out in the application scenario.

Postscript

This hash table benchmark series took at least four years from start to finish. The first version of the data was already available in 2022, when the M1 Max was still a fairly new CPU; now even the M5 is out. It took this long because organizing so many charts and analyses was genuinely tedious. I finished the main body of the posts roughly between 2022 and 2023, but left a lot of cleanup work undone out of laziness. Because this kept hanging over me, the blog also fell into a kind of head-of-line blocking: I kept thinking, "I still haven't finished this series," and ended up not publishing other posts either.

Over these years, many things have changed. CPUs have gone through several generations, new hash functions and hash tables have appeared, and LLM technology has also advanced rapidly. Many things discussed in these posts may already be somewhat out of date. All I can say is: time passes like a river, never stopping day or night.

Whatever the quality of this benchmark series may be, I have decided to stop iterating on it and publish it as it is.

Author: Renzibei
Link: https://renzibei.com/en/hashtable-bench/
Copyright Notice: All articles on this website, unless otherwise stated, are created by the author and are licensed under CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.

post in 2026-06-13

tags:

{ hashtable }

{ benchmark }

{ algorithm }