Hash Table Benchmark - String Insert and Construct

Throughput
1. Insert with reserve
2. Insert without reserve
Latency
1. Insert with reserve
2. Insert without reserve

The string key insert and construct test.

📄 Raw CSV data for this benchmark: Xeon E-2388G · M1 Max.

Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.

In this test, we measure the time spent constructing the hash table. The construction is done by the insert( const value_type& value ) operation. We test both with and without calling reserve before inserting. The time spent in reserve is not counted in the total time.

During the test, a hash table is constructed multiple times, and in the throughput test we record the total time spent on insert.

The length of a string is the same as std::string::length(), which means that one additional byte is needed to save the null character.

Inserting string keys is more expensive than inserting integers because each insert pays for three things that integers do not: hashing the whole byte sequence of the key, comparing whole strings on a collision, and -- for keys longer than the Small String Optimization (SSO) threshold -- allocating heap memory to hold the characters. On the libstdc++/libc++ implementations used here, a std::string of length up to 15 stores its bytes inline in the control block (no allocation), so the 12-byte keys are pure SSO and never touch the allocator, while the 24-byte and 64-byte keys spill to the heap and every insert also pays a malloc. This single fact -- SSO vs heap allocation -- is the biggest driver of the differences between the string-length variants below.

It is also worth setting expectations for the perfect-hash tables up front. fph::DynamicFphMap and fph::MetaFphMap are extremely fast at lookup because they build a (near-)perfect hash with no probing, but that construction is exactly what makes their insert slow -- they periodically rebuild the perfect hash as the table grows. So in this insert/construct test the fph tables are consistently the slowest, and that caveat matters for the insert results.

Throughput

Insert with reserve

<K,V>: <string with a fixed length of 64, uint64_t>

With reserve called ahead of time, the capacity is fixed before insertion, so this isolates the pure per-insert cost (hash, probe, allocate, store) with no rehashing. For the 64-byte fixed key on the Xeon, xxHash_xxh3 -- which is built to consume raw bytes -- is the best hash for nearly every table, and the leaders are tightly bunched: absl::flat_hash_map, ankerl::unordered_dense_map and robin_hood::unordered_flat_map all sit around 24-26 ns at 1024 elements and converge to roughly 118-119 ns at 10^7, where the per-insert cost is dominated by the malloc of the 64-character string and by cache misses, not by the table algorithm. std::unordered_map trails the flat tables by roughly 2x (about 48 ns small, 250 ns at 10^7) because it allocates a node per element on top of the string allocation.

The fph tables are dramatically slower here: fph::MetaFphMap and fph::DynamicFphMap reach about 2390 ns and 2471 ns at 10^7 on the Xeon -- on the order of 10x slower than std::unordered_map and 20x slower than the flat leaders -- because the perfect-hash build cost grows with the table. (Note their N=1024 points, ~198 and ~178 ns, are an artifact of building and rebuilding a perfect hash for a tiny table; they actually get relatively better at 32768 before the build cost dominates again.) On the M1 Max the ordering is the same but the gap is smaller: the fph tables land around 950 ns at 10^7 versus ~100-120 ns for the flat leaders.

The 24-byte and 12-byte variants below follow the same ranking, just shifted faster as the strings shrink: at 10^7 the flat leaders drop from ~118 ns (64-byte) to ~92-112 ns (24-byte) to ~50-63 ns (12-byte), the last because the 12-byte key is pure SSO and skips the allocator entirely. The "max length N" variants run about the same as, or a little faster than, the matching "fixed length N", because the randomly generated keys average shorter than the maximum -- so there is less to hash, compare, and copy -- even though the varying length makes the per-key cost less uniform.

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

Insert without reserve

<K,V>: <string with a fixed length of 64, uint64_t>

Without reserve, every table also pays the cost of growing and rehashing as it fills, which roughly doubles the per-insert time across the board. For the 64-byte fixed key on the Xeon, absl::flat_hash_map with xxHash_xxh3 is the fastest at large sizes (about 191 ns at 10^7) because it grows cheaply and rehashes a metadata array rather than moving whole entries again; ankerl::unordered_dense_map is just behind (~178 ns) since it only has to grow a dense array. The probing-heavy tables suffer more from rehash: ska::flat_hash_map climbs to about 465 ns at 10^7, more than double the absl number, because every growth re-inserts every element through the probe sequence.

The fph tables are much slower here. Without a reserve, every growth triggers a fresh perfect-hash build, so fph::MetaFphMap/fph::DynamicFphMap reach ~3800 and ~4180 ns at 10^7 on the Xeon -- about 10x the flat leaders and noticeably worse than their with-reserve numbers, where they at least only build the perfect hash once at the final size. This clearly shows that fph trades construction speed for lookup speed.

The smaller-key and "max length" variants below preserve this ranking; the 12-byte SSO keys are again the fastest (absl about 64 ns at 10^7) because they never call the allocator, leaving only the table's own growth and probe costs.

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

Latency

The P99 latency of a single insert is measured on the Xeon E-2388G only. It tells the same story from the tail end: the conventional flat tables keep their tail bounded while the perfect-hash tables spike whenever they rebuild.

Insert with reserve

<K,V>: <string with a fixed length of 64, uint64_t>

With a prior reserve, the 64-byte fixed key shows the flat tables holding a tight P99: absl::node_hash_map, ankerl::unordered_dense_map and absl::flat_hash_map sit around 490-530 ns at 10^7, with std::unordered_map at ~880 ns. The allocation of the 64-byte string sets a floor under all of them. The fph tables again stand out: fph::DynamicFphMap climbs to about 12660 ns and fph::MetaFphMap to about 12420 ns at 10^7 -- more than an order of magnitude worse -- because even with capacity reserved, the perfect-hash construction has occasional very expensive steps that dominate the 99th percentile. For the 12-byte SSO key the floor drops (the flat tables are around 460-490 ns at 10^7) since there is no allocation, but the fph tail stays high (fph::DynamicFphMap ~11780 ns).

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

Insert without reserve

<K,V>: <string with a fixed length of 64, uint64_t>

Without a prior reserve the P99 of the conventional tables stays close to the reserved case -- a rehash large enough to fall in the 99th percentile is rare relative to the number of inserts -- so for the 12-byte key absl::flat_hash_map still sits around 442 ns at 10^7 and std::unordered_map around 850 ns. The fph tables, by contrast, degrade sharply: without reserve they rebuild the perfect hash at every growth, so fph::DynamicFphMap and fph::MetaFphMap reach roughly 20400 and 20360 ns at 10^7, almost double their reserved tails. If stable insert latency matters, reserve ahead of time and avoid the perfect-hash tables for the build phase. The remaining string variants follow the same pattern.

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

← Back to Hash Table Benchmark index

Author: Renzibei
Link: https://renzibei.com/en/string-insert-construct/
Copyright Notice: All articles on this website, unless otherwise stated, are created by the author and are licensed under CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.

post in 2026-06-13

tags:

{ hashtable }

{ benchmark }

{ algorithm }