Hash Table Benchmark - String Erase and Insert
- Throughput
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
- Latency
- Insert (after erase)
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
- Erase
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
- Insert (after erase)
The string key erase and insert test.
Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.
In this test, we first construct a hash table with size N, and then repeat the following procedure M times:
- Insert a new element into the hash table
- Randomly erase an element from the hash table.
Because the table size stays at N (or N+1) throughout, this measures
the steady- state cost of churning a fully-grown table rather than the
cost of growing it. For string keys every insert and every erase still
has to hash the whole key and compare strings on collision, and for keys
above the SSO threshold (the 24- and 64-byte variants) every insert also
allocates the string's characters on the heap and every erase frees
them, so the 64-byte case is dominated by
malloc/free. The 12-byte keys are stored
inline (SSO) and skip the allocator entirely, which is why they are
several times faster than the long-string cases.
Two structural effects matter here. First, open-addressing tables use
tombstones: an erase marks the slot as deleted rather
than empty so the probe chains stay intact, and once tombstones
accumulate the table must rehash, which shows up as latency spikes.
Second, the perfect-hash tables fph::DynamicFphMap and
fph::MetaFphMap are a poor fit for this workload -- every
insert can force a perfect-hash rebuild, so they are not only the
slowest but actually time out at the larger sizes (those points read
0.00 and are not plotted). They are fast to look up but pay for
it heavily under churn, and that caveat matters for erase/insert
workloads.
Throughput
We record the time spent in the whole process, which includes both insert and erase operations.
The y axis value is the average time per operation. This result is
obtained by
time/op = (time for insert + time for erase) / (2 * M).
This is the average time taken for insert and erase.
<K,V>: <string with a fixed length of 64, uint64_t>
For the 64-byte fixed key on the Xeon, xxHash_xxh3 is
the best hash and the flat open-addressing tables are tightly bunched at
the front: ska::flat_hash_map, tsl::robin_map,
absl::flat_hash_map and
robin_hood::unordered_flat_map all run about 67-72 ns at
1024 elements and converge to roughly 320-360 ns at 10^7, where the
malloc/free of the 64-byte string and the
cache-missing probes dominate. std::unordered_map is about
50% slower (93 ns small, 513 ns at 10^7) because it allocates and frees
a node on top of the string on every modification.
The perfect-hash tables are much slower:
fph::DynamicFphMap/fph::MetaFphMap start
around 160-170 ns at 1024 and then time out --
fph::DynamicFphMap has no plotted point past 32768 and
fph::MetaFphMap shows a 6050 ns spike at 1.2M before
dropping out -- because the steady stream of inserts keeps triggering
perfect-hash rebuilds. Note also that
ankerl::unordered_dense_map competes well at small and mid
sizes (~75 ns at 1024) but its 10^7 point is absent (0.00); its
dense-array design pays extra to keep entries packed when an arbitrary
element is erased, since it back-fills the hole from the array end.
On the M1 Max the ranking is similar, with
ska::flat_hash_map and tsl::robin_map leading
(~370-380 ns at 10^7) and the fph tables again timing out at large
sizes. The smaller-key variants below preserve the ordering but run much
faster: the 12-byte SSO key brings the flat leaders down to ~87-91 ns at
10^7 (Xeon) because no allocation is involved.
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
Latency
We record the latency of insert and erase operations separately. The insert latency here is different from that in the "Insert and Construct" test. The latency statistics in the construct test include all operations from size 0 to size N, while in this test the size of the hash table is always N or N + 1.
Insert (after erase)
<K,V>: <string with a fixed length of 64, uint64_t>
Latency is measured on the Xeon E-2388G only. The first charts in
this section show the 64-byte fixed key, but looking at the 12-byte SSO
key makes the table algorithm clearest, since allocation noise is
removed. For the P50 (median) insert-after-erase,
absl::flat_hash_map and absl::node_hash_map
are best at large sizes (~104-110 ns at 10^7), while
ska::flat_hash_map and tsl::robin_map lead at
small sizes (~14-16 ns at 1024) but lose ground in the 200k-1.2M range
(~88-100 ns) as their probe chains lengthen. The fph tables sit far
behind even at the median (~50-68 ns at 1024, climbing into the
hundreds) and drop out at large sizes.
The P99 (tail) is where the tombstone-rehash behaviour shows. The
conventional flat tables keep a bounded tail --
absl::flat_hash_map is around 492 ns and
std::unordered_map around 1000 ns at 10^7 -- but the
perfect-hash tables have very large spikes:
fph::MetaFphMap's P99 reaches ~44000 ns at 1.2M and ~231000
ns at 10^7, because an insert that triggers a full perfect-hash rebuild
lands directly in the tail. These tables should never be chosen where
bounded modification latency matters.
For the 64-byte key the string allocation raises the floor: every
table's P99 is several hundred ns even at small sizes (~470 ns at 1024),
and the same fph blowup (~48000 ns at 1.2M for
fph::MetaFphMap) appears. Other length variants follow the
same pattern.
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
Erase
<K,V>: <string with a fixed length of 64, uint64_t>
Erase latency is closer between tables because an erase on an
open-addressing table is cheap -- it locates the slot and drops a
tombstone -- with no allocation on the inline-stored path. For the
64-byte key the P99 erase is bunched between about 1030 and 1140 ns at
10^7 for the conventional tables (absl::flat_hash_map ~1030
ns, std::unordered_map ~1390 ns), the floor again set by
the free of the string's heap buffer plus the cache miss to
reach the slot. As before the fph tables are the exception:
fph::DynamicFphMap already times out past 32768 and
fph::MetaFphMap's P99 climbs past ~1200 ns at 1.2M before
dropping out, since an erase that crosses the tombstone threshold forces
a perfect-hash rebuild. The median (P50) erase is dominated by
ska::flat_hash_map, tsl::robin_map and
emhash::hash_map7 at small-to-mid sizes; the remaining
string variants tell the same story scaled by string length.
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
- Link: https://renzibei.com/en/string-erase-insert/
-
Copyright Notice:
All articles on this website, unless otherwise stated, are created by the author and are licensed under
CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.