Hash Table Benchmark - String Iterate
- Throughput
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
- Latency
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
The string key iterate test.
Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.
In this test, we measure the performance of iterating over the hash table.
Iteration is the one operation where the key type barely matters. Walking a table never hashes a key and never compares two strings; it only advances an iterator and reads back each stored entry. Therefore, unlike lookup or insertion, the string length and the choice of hash function have almost no influence here. What dominates is how a table lays out its entries in memory, exactly as in the integer iterate test. The three storage strategies are:
- Dense array storage.
ankerl::unordered_dense_mapandemhash::hash_map7keep all key-value pairs packed in a contiguous array and store only indices (or small metadata) in the hash slots. Iterating is a linear scan over a dense array, so the cost per element is tiny and independent of the load factor. - Inline open addressing.
ska::flat_hash_map,ska::bytell_hash_map,tsl::robin_map,absl::flat_hash_map,fph::*androbin_hood::unordered_flat_mapstore the entries directly in a sparse slot array. To iterate they walk the whole slot array and skip the empty slots, so the work per element grows as the table becomes emptier and the array spills out of cache. - Node-based storage.
std::unordered_mapandabsl::node_hash_mapallocate each entry in a separate node and chase pointers between them.
One string-specific note: the stored entry here is
std::pair<std::string, uint64_t>, which is the same
fixed size (a 32-byte std::string control block plus the
value) regardless of whether the string is 12, 24 or 64 characters long.
The actual character bytes of a long, heap-allocated string live
elsewhere and are not touched during iteration, since we only
advance the iterator and do not read the string contents. That is why
the iterate numbers are essentially identical across all six string
variants.
Throughput
<K,V>: <string with a fixed length of 64, uint64_t>
The charts confirm the layout-driven picture. On both platforms
ankerl::unordered_dense_map is the fastest by a wide margin
and is perfectly flat across the whole size range -- about 0.22 ns per
element on the Xeon E-2388G and about 2.0 ns on the M1 Max -- because it
always scans a packed array no matter how many slots the table has.
emhash::hash_map7 is the runner-up with the same flat
behaviour (about 0.63 ns on the Xeon and 2.6 ns on the M1).
Every inline open-addressing table behaves differently: its
per-element cost rises with the element count as the slot array grows
past the cache. ska::flat_hash_map, for example, climbs
from under 1 ns at 1024 elements to about 12.4 ns at 10^7 on the Xeon,
because most of that time is then spent reading empty slots from memory.
The node-based std::unordered_map is the slowest at large
sizes -- around 48 ns per element at 10^7 on the Xeon and 25 ns on the
M1 -- since iterating its node list becomes a stream of cache-missing
pointer dereferences.
The perfect-hash tables sit in the middle of the open-addressing pack
and never lead here: fph::MetaFphMap runs about 7.4 ns and
fph::DynamicFphMap about 10.8 ns at 10^7 on the Xeon. A
perfect hash buys fast lookup, but iteration still has to walk
a sparse slot array, so it gives fph no advantage and
fph::DynamicFphMap is in fact one of the slower flat tables
to scan because it keeps a relatively sparse slot array.
The remaining five string variants below tell the same story almost to the decimal -- iteration ignores the key contents, so the fixed-vs-max length and the 12/24/64-byte distinction make no measurable difference.
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
Latency
The P99 latency of a single iteration step tells the same story from
the tail end (latency is measured on the Xeon E-2388G only).
ankerl::unordered_dense_map stays flat at about 0.94 ns
regardless of size, because advancing over a dense array never misses
far. The inline and node-based tables develop growing tails as the
backing storage outgrows the cache: at 10^7 elements the P99 step
reaches roughly 84 ns for ska::bytell_hash_map, about 110
ns for ska::flat_hash_map and about 103 ns for
tsl::robin_map, while the node-based
std::unordered_map and absl::node_hash_map
reach hundreds of nanoseconds (about 406 ns and 444 ns respectively),
with each spike corresponding to a cache miss on the next slot or node.
The first point (N=1024) is a small outlier for ankerl at
3.13 ns -- a cold-start artifact -- after which it settles to its flat
0.94 ns. The other five string variants follow the same pattern.
<K,V>: <string with a fixed length of 64, uint64_t>
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
- Link: https://renzibei.com/en/string-iterate/
-
Copyright Notice:
All articles on this website, unless otherwise stated, are created by the author and are licensed under
CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.