Hash Table Benchmark - String Iterate

Throughput
Latency

The string key iterate test.

📄 Raw CSV data for this benchmark: Xeon E-2388G · M1 Max.

Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.

In this test, we measure the performance of iterating over the hash table.

Iteration is the one operation where the key type barely matters. Walking a table never hashes a key and never compares two strings; it only advances an iterator and reads back each stored entry. Therefore, unlike lookup or insertion, the string length and the choice of hash function have almost no influence here. What dominates is how a table lays out its entries in memory, exactly as in the integer iterate test. The three storage strategies are:

Dense array storage. ankerl::unordered_dense_map and emhash::hash_map7 keep all key-value pairs packed in a contiguous array and store only indices (or small metadata) in the hash slots. Iterating is a linear scan over a dense array, so the cost per element is tiny and independent of the load factor.
Inline open addressing. ska::flat_hash_map, ska::bytell_hash_map, tsl::robin_map, absl::flat_hash_map, fph::* and robin_hood::unordered_flat_map store the entries directly in a sparse slot array. To iterate they walk the whole slot array and skip the empty slots, so the work per element grows as the table becomes emptier and the array spills out of cache.
Node-based storage. std::unordered_map and absl::node_hash_map allocate each entry in a separate node and chase pointers between them.

One string-specific note: the stored entry here is std::pair<std::string, uint64_t>, which is the same fixed size (a 32-byte std::string control block plus the value) regardless of whether the string is 12, 24 or 64 characters long. The actual character bytes of a long, heap-allocated string live elsewhere and are not touched during iteration, since we only advance the iterator and do not read the string contents. That is why the iterate numbers are essentially identical across all six string variants.

Throughput

<K,V>: <string with a fixed length of 64, uint64_t>

The charts confirm the layout-driven picture. On both platforms ankerl::unordered_dense_map is the fastest by a wide margin and is perfectly flat across the whole size range -- about 0.22 ns per element on the Xeon E-2388G and about 2.0 ns on the M1 Max -- because it always scans a packed array no matter how many slots the table has. emhash::hash_map7 is the runner-up with the same flat behaviour (about 0.63 ns on the Xeon and 2.6 ns on the M1).

Every inline open-addressing table behaves differently: its per-element cost rises with the element count as the slot array grows past the cache. ska::flat_hash_map, for example, climbs from under 1 ns at 1024 elements to about 12.4 ns at 10^7 on the Xeon, because most of that time is then spent reading empty slots from memory. The node-based std::unordered_map is the slowest at large sizes -- around 48 ns per element at 10^7 on the Xeon and 25 ns on the M1 -- since iterating its node list becomes a stream of cache-missing pointer dereferences.

The perfect-hash tables sit in the middle of the open-addressing pack and never lead here: fph::MetaFphMap runs about 7.4 ns and fph::DynamicFphMap about 10.8 ns at 10^7 on the Xeon. A perfect hash buys fast lookup, but iteration still has to walk a sparse slot array, so it gives fph no advantage and fph::DynamicFphMap is in fact one of the slower flat tables to scan because it keeps a relatively sparse slot array.

The remaining five string variants below tell the same story almost to the decimal -- iteration ignores the key contents, so the fixed-vs-max length and the 12/24/64-byte distinction make no measurable difference.

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

Latency

The P99 latency of a single iteration step tells the same story from the tail end (latency is measured on the Xeon E-2388G only). ankerl::unordered_dense_map stays flat at about 0.94 ns regardless of size, because advancing over a dense array never misses far. The inline and node-based tables develop growing tails as the backing storage outgrows the cache: at 10^7 elements the P99 step reaches roughly 84 ns for ska::bytell_hash_map, about 110 ns for ska::flat_hash_map and about 103 ns for tsl::robin_map, while the node-based std::unordered_map and absl::node_hash_map reach hundreds of nanoseconds (about 406 ns and 444 ns respectively), with each spike corresponding to a cache miss on the next slot or node. The first point (N=1024) is a small outlier for ankerl at 3.13 ns -- a cold-start artifact -- after which it settles to its flat 0.94 ns. The other five string variants follow the same pattern.

<K,V>: <string with a fixed length of 64, uint64_t>

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

← Back to Hash Table Benchmark index

Author: Renzibei
Link: https://renzibei.com/en/string-iterate/
Copyright Notice: All articles on this website, unless otherwise stated, are created by the author and are licensed under CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.

post in 2026-06-13

tags:

{ hashtable }

{ benchmark }

{ algorithm }