Hash Table Benchmark - String Erase and Insert

Throughput
Latency
1. Insert (after erase)
2. Erase

The string key erase and insert test.

📄 Raw CSV data for this benchmark: Xeon E-2388G · M1 Max.

Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.

In this test, we first construct a hash table with size N, and then repeat the following procedure M times:

Insert a new element into the hash table
Randomly erase an element from the hash table.

Because the table size stays at N (or N+1) throughout, this measures the steady- state cost of churning a fully-grown table rather than the cost of growing it. For string keys every insert and every erase still has to hash the whole key and compare strings on collision, and for keys above the SSO threshold (the 24- and 64-byte variants) every insert also allocates the string's characters on the heap and every erase frees them, so the 64-byte case is dominated by malloc/free. The 12-byte keys are stored inline (SSO) and skip the allocator entirely, which is why they are several times faster than the long-string cases.

Two structural effects matter here. First, open-addressing tables use tombstones: an erase marks the slot as deleted rather than empty so the probe chains stay intact, and once tombstones accumulate the table must rehash, which shows up as latency spikes. Second, the perfect-hash tables fph::DynamicFphMap and fph::MetaFphMap are a poor fit for this workload -- every insert can force a perfect-hash rebuild, so they are not only the slowest but actually time out at the larger sizes (those points read 0.00 and are not plotted). They are fast to look up but pay for it heavily under churn, and that caveat matters for erase/insert workloads.

Throughput

We record the time spent in the whole process, which includes both insert and erase operations.

The y axis value is the average time per operation. This result is obtained by time/op = (time for insert + time for erase) / (2 * M). This is the average time taken for insert and erase.

<K,V>: <string with a fixed length of 64, uint64_t>

For the 64-byte fixed key on the Xeon, xxHash_xxh3 is the best hash and the flat open-addressing tables are tightly bunched at the front: ska::flat_hash_map, tsl::robin_map, absl::flat_hash_map and robin_hood::unordered_flat_map all run about 67-72 ns at 1024 elements and converge to roughly 320-360 ns at 10^7, where the malloc/free of the 64-byte string and the cache-missing probes dominate. std::unordered_map is about 50% slower (93 ns small, 513 ns at 10^7) because it allocates and frees a node on top of the string on every modification.

The perfect-hash tables are much slower: fph::DynamicFphMap/fph::MetaFphMap start around 160-170 ns at 1024 and then time out -- fph::DynamicFphMap has no plotted point past 32768 and fph::MetaFphMap shows a 6050 ns spike at 1.2M before dropping out -- because the steady stream of inserts keeps triggering perfect-hash rebuilds. Note also that ankerl::unordered_dense_map competes well at small and mid sizes (~75 ns at 1024) but its 10^7 point is absent (0.00); its dense-array design pays extra to keep entries packed when an arbitrary element is erased, since it back-fills the hole from the array end.

On the M1 Max the ranking is similar, with ska::flat_hash_map and tsl::robin_map leading (~370-380 ns at 10^7) and the fph tables again timing out at large sizes. The smaller-key variants below preserve the ordering but run much faster: the 12-byte SSO key brings the flat leaders down to ~87-91 ns at 10^7 (Xeon) because no allocation is involved.

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

Latency

We record the latency of insert and erase operations separately. The insert latency here is different from that in the "Insert and Construct" test. The latency statistics in the construct test include all operations from size 0 to size N, while in this test the size of the hash table is always N or N + 1.

Insert (after erase)

<K,V>: <string with a fixed length of 64, uint64_t>

Latency is measured on the Xeon E-2388G only. The first charts in this section show the 64-byte fixed key, but looking at the 12-byte SSO key makes the table algorithm clearest, since allocation noise is removed. For the P50 (median) insert-after-erase, absl::flat_hash_map and absl::node_hash_map are best at large sizes (~104-110 ns at 10^7), while ska::flat_hash_map and tsl::robin_map lead at small sizes (~14-16 ns at 1024) but lose ground in the 200k-1.2M range (~88-100 ns) as their probe chains lengthen. The fph tables sit far behind even at the median (~50-68 ns at 1024, climbing into the hundreds) and drop out at large sizes.

The P99 (tail) is where the tombstone-rehash behaviour shows. The conventional flat tables keep a bounded tail -- absl::flat_hash_map is around 492 ns and std::unordered_map around 1000 ns at 10^7 -- but the perfect-hash tables have very large spikes: fph::MetaFphMap's P99 reaches ~44000 ns at 1.2M and ~231000 ns at 10^7, because an insert that triggers a full perfect-hash rebuild lands directly in the tail. These tables should never be chosen where bounded modification latency matters.

For the 64-byte key the string allocation raises the floor: every table's P99 is several hundred ns even at small sizes (~470 ns at 1024), and the same fph blowup (~48000 ns at 1.2M for fph::MetaFphMap) appears. Other length variants follow the same pattern.

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

Erase

<K,V>: <string with a fixed length of 64, uint64_t>

Erase latency is closer between tables because an erase on an open-addressing table is cheap -- it locates the slot and drops a tombstone -- with no allocation on the inline-stored path. For the 64-byte key the P99 erase is bunched between about 1030 and 1140 ns at 10^7 for the conventional tables (absl::flat_hash_map ~1030 ns, std::unordered_map ~1390 ns), the floor again set by the free of the string's heap buffer plus the cache miss to reach the slot. As before the fph tables are the exception: fph::DynamicFphMap already times out past 32768 and fph::MetaFphMap's P99 climbs past ~1200 ns at 1.2M before dropping out, since an erase that crosses the tombstone threshold forces a perfect-hash rebuild. The median (P50) erase is dominated by ska::flat_hash_map, tsl::robin_map and emhash::hash_map7 at small-to-mid sizes; the remaining string variants tell the same story scaled by string length.

<K,V>: <string with a max length of 64, uint64_t>

<K,V>: <string with a fixed length of 24, uint64_t>

<K,V>: <string with a max length of 24, uint64_t>

<K,V>: <string with a fixed length of 12, uint64_t>

<K,V>: <string with a max length of 12, uint64_t>

← Back to Hash Table Benchmark index

Author: Renzibei
Link: https://renzibei.com/en/string-erase-insert/
Copyright Notice: All articles on this website, unless otherwise stated, are created by the author and are licensed under CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.

post in 2026-06-13

tags:

{ hashtable }

{ benchmark }

{ algorithm }