Hash Table Benchmark - Integer Erase and Insert
The integer key erase and insert test.
Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.
In this test, we first construct a hash table with size N, and then repeat the following operations M times:
- Insert a new element into the hash table
- Randomly erase an element from the hash table.
It should be noted that the results of this test are also highly correlated with the distribution of the data, especially the relationship between deleted data and inserted data. Here we randomly select elements to delete with equal probability. In reality, elements may not be selected with equal probability; for example, the most likely deleted element may be the most recently inserted element.
Throughput
We record the time spent in the whole process, which includes both insert and erase operations.
The y axis value is the average time per operation. This result is
obtained by
time/op = (time for insert + time for erase) / (2 * M).
This is the average time taken for insert and erase. This number
reflects the efficiency of making modifications to the hash table.
<K,V>: <uint64_t with several split bits masked, uint64_t>
On Intel Rocket Lake, ska::flat_hash_map with
std::hash is the fastest or near-fastest across almost the
whole range (about 5 to 37 ns per operation).
emhash::hash_map7 with absl::Hash and
absl::flat_hash_map with absl::Hash are close
behind in the medium range. tsl::robin_map is fast at small
sizes and again at very large sizes (about 39 ns at 10^7), but it slumps
in the medium range (rising to 60 to 90 ns near 400,000 to 800,000)
because of the poorer distribution of robin_hood::hash on
these key patterns.
On Apple M1 Max, the combination of ska::flat_hash_map
and std::hash has a comparative advantage in almost every
data scale (about 6 to 33 ns).
It is also worth noting that on the M1 Max chip,
absl::node_hash_map shows a large performance degradation,
with a pronounced bump in the data range from about 45,000 to 100,000
elements (peaking near 135 ns). std::unordered_map also
exhibits performance degradation, climbing steadily and peaking around
100,000 to 300,000 elements (up to about 600 ns). It is not clear what
the cause of this degradation is. It may be related to the system's
memory allocation policy, as both hash tables require memory allocation
and recycling operations for each insertion and deletion. This
phenomenon is more pronounced on datasets with
<uint64_t, 56 bytes>.
<K,V>: <uint64_t with several split bits masked, 56 bytes struct>
On Intel Rocket Lake, the rankings are similar to those of <K,V>: <uint64_t with several
split bits masked, uint64_t>: ska::flat_hash_map
with std::hash keeps a comfortable lead (about 8 to 59 ns),
and absl::flat_hash_map is the closest follower.
On M1 Max, the relative performance of both
absl::flat_hash_map and emhash::hash_map7
increases a bit when the number of elements is in the range of roughly
32,768 to 1,200,000, where emhash::hash_map7 with
absl::Hash actually edges ahead of
ska::flat_hash_map at several points.
Latency
We record the latency of insert and erase operations separately. The insert latency here is different from that in the "Insert and Construct" test. The latency statistics in the construct test include all operations from size 0 to size N, while in this test the size of the hash table is always N or N + 1.
Insert (after erase)
<K,V>: <uint64_t with several split bits masked, uint64_t> Insert Latency
First consider P50 latency. When the data size is small, both
ska::flat_hash_map with std::hash and
tsl::robin_map with absl::Hash are in the
first tier in terms of speed (about 7 to 9 ns). When the number of
elements is larger, absl::flat_hash_map and
absl::node_hash_map with absl::Hash have a
greater advantage: past roughly 300,000 elements the ska/tsl tables jump
up (to about 50 ns and beyond) while the absl tables stay near 24 to 28
ns. In addition, the median latency of most open-addressed hash tables
converges again as the number of elements approaches 10^7 (around 92 to
99 ns).
For P99 latency, emhash::hash_map7 with
absl::Hash has the smallest tail latency through small and
medium sizes (about 31 ns at small counts, staying ahead up to roughly
200,000 elements). When the number of elements is larger than that,
absl::flat_hash_map comes out on top.
Another point that has to be mentioned about modifying open-addressed hash tables is the tombstone mechanism used in the implementation. For some hash tables, when a delete operation is performed, a special marker (tombstone) is placed on the slot where the deleted element was located. A tombstone marker is not the same as an empty marker. If a hash table has too many tombstone markers, its lookup performance will be affected. Therefore, some hash tables will rehash when the number of tombstones reaches a certain percentage. This can give an insert operation mixed with erase a poor maximum latency. The P100 latency helps show this.
In addition to the element counts that are powers of 2 (where the
first insertion may lead to expansion), some hash tables have P100
latency at some data points proportional to the number of elements. This
phenomenon is observed for both the absl-series hash tables and
robin_hood::unordered_flat_map. These hash tables should
not be selected if the user requires strict maximum latency for
modification operations.
<K,V>: <uint64_t with several split bits masked, 56 bytes struct> Insert Latency
When the size of value_type becomes 64 bytes, the
advantage of ska::flat_hash_map with std::hash
over tsl::robin_map increases when the amount of data is
small (about 8 ns vs 11 ns at the smallest sizes for P50).
For P99 latency, at small element counts the smallest tail belongs to
absl::flat_hash_map with absl::Hash (about 33
ns), ahead of ska::flat_hash_map and
emhash::hash_map7.
Erase
<K,V>: <uint64_t with several split bits masked, uint64_t> Erase Latency
For P50 latency, ska::flat_hash_map with
std::hash almost always has the smallest latency, except in
the range from about 300,000 to 800,000 elements. In that range
ska::flat_hash_map jumps up (to about 100 to 160 ns)
because its aggressive expansion and low load factor push it into a
slower memory tier, while absl::flat_hash_map and
robin_hood::unordered_flat_map stay lower (about 45 to 120
ns) and perform relatively better. Above 1,200,000 elements the tables
converge again.
For P99 latency, emhash::hash_map7 with
absl::Hash performs the fastest when the number of elements
is small (about 19 ns, leading up to roughly 3,000 elements). The tail
latency of absl::flat_hash_map with absl::Hash
is smaller in the medium range, roughly from 8,000 to 200,000 elements.
When the number of elements is larger, the performance of many hash
tables is close.
<K,V>: <uint64_t with several split bits masked, 56 bytes struct> Erase Latency
When the size of the value_type is 64 bytes, it is
basically the same as <uint64_t, uint64_t>.
Throughput Appendix
<K,V>: <uint64_t with high position bits masked, uint64_t>
<K,V>: <uint64_t with low position bits masked, uint64_t>
<K,V>: <uint64_t uniformly distributed, uint64_t>
Latency Appendix
Insert (after erase)
<K,V>: <uint64_t with high position bits masked, uint64_t> Insert Latency
<K,V>: <uint64_t with low position bits masked, uint64_t> Insert Latency
<K,V>: <uint64_t uniformly distributed, uint64_t> Insert Latency
Erase
<K,V>: <uint64_t with high position bits masked, uint64_t> Erase Latency
<K,V>: <uint64_t with low position bits masked, uint64_t> Erase Latency
<K,V>: <uint64_t uniformly distributed, uint64_t> Erase Latency
- Link: https://renzibei.com/en/int-erase-insert/
-
Copyright Notice:
All articles on this website, unless otherwise stated, are created by the author and are licensed under
CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.