Hash Table Benchmark - String Insert and Construct
- Throughput
- Insert with reserve
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
- Insert without reserve
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
- Insert with reserve
- Latency
- Insert with reserve
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
- Insert without reserve
- <K,V>: <string with a fixed length of 64, uint64_t>
- <K,V>: <string with a max length of 64, uint64_t>
- <K,V>: <string with a fixed length of 24, uint64_t>
- <K,V>: <string with a max length of 24, uint64_t>
- <K,V>: <string with a fixed length of 12, uint64_t>
- <K,V>: <string with a max length of 12, uint64_t>
- Insert with reserve
The string key insert and construct test.
Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.
In this test, we measure the time spent constructing the hash table.
The construction is done by the
insert( const value_type& value ) operation. We test
both with and without calling reserve before inserting. The time spent
in reserve is not counted in the total time.
During the test, a hash table is constructed multiple times, and in the throughput test we record the total time spent on insert.
The length of a string is the same as
std::string::length(), which means that one additional byte
is needed to save the null character.
Inserting string keys is more expensive than inserting integers
because each insert pays for three things that integers do not: hashing
the whole byte sequence of the key, comparing whole strings on a
collision, and -- for keys longer than the Small String Optimization
(SSO) threshold -- allocating heap memory to hold the characters. On the
libstdc++/libc++ implementations used here, a std::string
of length up to 15 stores its bytes inline in the control block (no
allocation), so the 12-byte keys are pure SSO and never touch the
allocator, while the 24-byte and 64-byte keys spill to the heap and
every insert also pays a malloc. This single fact -- SSO vs
heap allocation -- is the biggest driver of the differences between the
string-length variants below.
It is also worth setting expectations for the perfect-hash tables up
front. fph::DynamicFphMap and fph::MetaFphMap
are extremely fast at lookup because they build a
(near-)perfect hash with no probing, but that construction is exactly
what makes their insert slow -- they periodically rebuild the
perfect hash as the table grows. So in this insert/construct test the
fph tables are consistently the slowest, and that caveat matters for the
insert results.
Throughput
Insert with reserve
<K,V>: <string with a fixed length of 64, uint64_t>
With reserve called ahead of time, the capacity is fixed
before insertion, so this isolates the pure per-insert cost (hash,
probe, allocate, store) with no rehashing. For the 64-byte fixed key on
the Xeon, xxHash_xxh3 -- which is built to consume raw
bytes -- is the best hash for nearly every table, and the leaders are
tightly bunched: absl::flat_hash_map,
ankerl::unordered_dense_map and
robin_hood::unordered_flat_map all sit around 24-26 ns at
1024 elements and converge to roughly 118-119 ns at 10^7, where the
per-insert cost is dominated by the malloc of the
64-character string and by cache misses, not by the table algorithm.
std::unordered_map trails the flat tables by roughly 2x
(about 48 ns small, 250 ns at 10^7) because it allocates a node per
element on top of the string allocation.
The fph tables are dramatically slower here:
fph::MetaFphMap and fph::DynamicFphMap reach
about 2390 ns and 2471 ns at 10^7 on the Xeon -- on the order of 10x
slower than std::unordered_map and 20x slower than the flat
leaders -- because the perfect-hash build cost grows with the table.
(Note their N=1024 points, ~198 and ~178 ns, are an artifact of building
and rebuilding a perfect hash for a tiny table; they actually get
relatively better at 32768 before the build cost dominates
again.) On the M1 Max the ordering is the same but the gap is smaller:
the fph tables land around 950 ns at 10^7 versus ~100-120 ns for the
flat leaders.
The 24-byte and 12-byte variants below follow the same ranking, just shifted faster as the strings shrink: at 10^7 the flat leaders drop from ~118 ns (64-byte) to ~92-112 ns (24-byte) to ~50-63 ns (12-byte), the last because the 12-byte key is pure SSO and skips the allocator entirely. The "max length N" variants run about the same as, or a little faster than, the matching "fixed length N", because the randomly generated keys average shorter than the maximum -- so there is less to hash, compare, and copy -- even though the varying length makes the per-key cost less uniform.
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
Insert without reserve
<K,V>: <string with a fixed length of 64, uint64_t>
Without reserve, every table also pays the cost of
growing and rehashing as it fills, which roughly doubles the per-insert
time across the board. For the 64-byte fixed key on the Xeon,
absl::flat_hash_map with xxHash_xxh3 is the
fastest at large sizes (about 191 ns at 10^7) because it grows cheaply
and rehashes a metadata array rather than moving whole entries again;
ankerl::unordered_dense_map is just behind (~178 ns) since
it only has to grow a dense array. The probing-heavy tables suffer more
from rehash: ska::flat_hash_map climbs to about 465 ns at
10^7, more than double the absl number, because every growth re-inserts
every element through the probe sequence.
The fph tables are much slower here. Without a reserve, every growth
triggers a fresh perfect-hash build, so
fph::MetaFphMap/fph::DynamicFphMap reach ~3800
and ~4180 ns at 10^7 on the Xeon -- about 10x the flat leaders and
noticeably worse than their with-reserve numbers, where they at least
only build the perfect hash once at the final size. This clearly shows
that fph trades construction speed for lookup speed.
The smaller-key and "max length" variants below preserve this ranking; the 12-byte SSO keys are again the fastest (absl about 64 ns at 10^7) because they never call the allocator, leaving only the table's own growth and probe costs.
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
Latency
The P99 latency of a single insert is measured on the Xeon E-2388G only. It tells the same story from the tail end: the conventional flat tables keep their tail bounded while the perfect-hash tables spike whenever they rebuild.
Insert with reserve
<K,V>: <string with a fixed length of 64, uint64_t>
With a prior reserve, the 64-byte fixed key shows the flat tables
holding a tight P99: absl::node_hash_map,
ankerl::unordered_dense_map and
absl::flat_hash_map sit around 490-530 ns at 10^7, with
std::unordered_map at ~880 ns. The allocation of the
64-byte string sets a floor under all of them. The fph tables again
stand out: fph::DynamicFphMap climbs to about 12660 ns and
fph::MetaFphMap to about 12420 ns at 10^7 -- more than an
order of magnitude worse -- because even with capacity reserved, the
perfect-hash construction has occasional very expensive steps that
dominate the 99th percentile. For the 12-byte SSO key the floor drops
(the flat tables are around 460-490 ns at 10^7) since there is no
allocation, but the fph tail stays high (fph::DynamicFphMap
~11780 ns).
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
Insert without reserve
<K,V>: <string with a fixed length of 64, uint64_t>
Without a prior reserve the P99 of the conventional tables stays
close to the reserved case -- a rehash large enough to fall in the 99th
percentile is rare relative to the number of inserts -- so for the
12-byte key absl::flat_hash_map still sits around 442 ns at
10^7 and std::unordered_map around 850 ns. The fph tables,
by contrast, degrade sharply: without reserve they rebuild the perfect
hash at every growth, so fph::DynamicFphMap and
fph::MetaFphMap reach roughly 20400 and 20360 ns at 10^7,
almost double their reserved tails. If stable insert latency matters,
reserve ahead of time and avoid the perfect-hash tables for the build
phase. The remaining string variants follow the same pattern.
<K,V>: <string with a max length of 64, uint64_t>
<K,V>: <string with a fixed length of 24, uint64_t>
<K,V>: <string with a max length of 24, uint64_t>
<K,V>: <string with a fixed length of 12, uint64_t>
<K,V>: <string with a max length of 12, uint64_t>
- Link: https://renzibei.com/en/string-insert-construct/
-
Copyright Notice:
All articles on this website, unless otherwise stated, are created by the author and are licensed under
CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.