Hash Table Benchmark - Memory Usage and Load Factor
This page discusses the memory usage and load factor of hash tables.
Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.
During the previous tests, we recorded the heap memory size and load
factor. We count the heap memory size by implementing a custom
Allocator, which counts the allocated bytes during the
allocate() function call. However, the accuracy of the
counted number depends on the hash table container correctly using
Allocator; that is, all memory allocations must go through
the allocator. Some hash tables, such as emhash::HashMap7,
will not get accurate memory data because Allocator is not
fully used for all heap memory allocations.
One of the troublesome aspects of C++ design is that classes that use
different Allocator template parameters belong to different types (which
is what the std::pmr container is trying to solve). For
example, std::basic_string using a
std::allocator and std::basic_string using a
custom allocator are two types, and hash functions like
std::hash are usually only compatible with
std::string using std::allocator. Therefore,
it is not possible to use these hash functions directly for strings that
use other allocators, and our method of counting heap memory size cannot
count the heap memory of strings. As a result, we only count cases where
both keys and values are integer types.
It needs to be clarified that if the user doesn't care about the total heap memory size but cares about the warm memory size related to the query speed, then the data in this test cannot accurately reflect the cache size that the hash table needs to utilize. Some hash tables require some extra space for non-query work, such as insertion, and this part of the memory space is not accessed during lookup operations.
The set of element counts used in this test is different from other
test items. To reflect the ability to cope with different load factors,
the number of elements is chosen as 0.4 times or 0.6 times a power of 2,
e.g.
0.4 x 2^10, 0.6 x 2^10, 0.4 x 2^11, 0.6 x 2^11, ...
Lookup-related Cache Size Overhead Analysis
The heap memory a table occupies and the load factor it runs at directly shape its lookup speed, because together they decide how much memory a lookup must stream through and therefore how soon the working set spills out of each cache level. A lower load factor wastes more slots but shortens probe chains; a smaller per-element footprint fits more elements into the same cache. The cache-tier boundaries seen in the lookup throughput test are exactly the points where a table's footprint crosses the L1, L2 and L3 capacities, so the two charts below are the structural explanation behind those tiers.
Because these figures mostly describe the data structures themselves rather than the processor, the heap-memory numbers are mostly CPU-independent. Exact totals can still differ across STL implementations and allocators. The memory-size charts below therefore show the measured Intel and M1 Max values separately; treat them as structure-driven but implementation-sensitive.
Heap Memory Size
Memory size when using default max_load_factor
<K,V>: <uint64_t with several split bits masked, uint64_t>
The footprint splits the tables into three groups. The
SwissTable-style absl::flat_hash_map and
ska::bytell_hash_map, which spend only one metadata byte
per slot, are the most compact — about 272 MB for 10^7 integer pairs.
The node-based std::unordered_map and
absl::node_hash_map come next (about 308 MB and 297 MB),
paying for a per-node allocation and its pointer. The perfect-hash
fph::DynamicFphMap and fph::MetaFphMap use
roughly twice the most compact tables (about 556-572 MB), the cost of
the extra index and metadata that make their lookups fast. Largest of
all are ska::flat_hash_map and tsl::robin_map
at about 768 MB, because they keep a low maximum load factor and store
the full, alignment-padded value type in every slot.
emhash::hash_map7 is shown as zero because it does not
route all of its allocations through the custom counting allocator, so
its memory cannot be measured here (as noted above).
Memory size when using large max_load_factor
<K,V>: <uint64_t with several split bits masked, uint64_t>
Load factor
Load factor when using default max_load_factor
<K,V>: <uint64_t with several split bits masked, uint64_t>
The load-factor chart explains part of that footprint. Most
open-addressing tables grow by doubling, so their load factor sawtooths
between roughly 0.4 right after a growth and 0.75-0.9 just before the
next one; the test samples sizes at 0.4x and 0.6x powers of two, which
is why the visible values cluster around 0.5-0.76.
ska::flat_hash_map and tsl::robin_map with
robin_hood::hash settle to the lowest load factors (about
0.29-0.30 at large sizes), trading memory for shorter probe chains — the
same low occupancy that helps their lookup speed. At the other extreme,
std::unordered_map runs at a load factor close to 1.0,
because a node is allocated per element and there are no empty slots to
leave slack; its memory cost lives in the nodes and pointers rather than
in spare slots. Raising max_load_factor packs the flat
tables more tightly — at 10^7 elements ska::flat_hash_map's
load factor rises from about 0.30 to about 0.60, roughly halving its
empty space — at the price of longer probe sequences and slower lookups,
the tradeoff explored in the lookup tests.
Load factor when using large max_load_factor
<K,V>: <uint64_t with several split bits masked, uint64_t>
- Link: https://renzibei.com/en/memory-usage-and-load-factor/
-
Copyright Notice:
All articles on this website, unless otherwise stated, are created by the author and are licensed under
CC BY-NC-ND 4.0 License. Please observe the agreement and cite the source when quoting and reproducing.