Hash Table Benchmark - Memory Usage and Load Factor

This page discusses the memory usage and load factor of hash tables.

Click the labels on the legend to hide or show the data lines for specific hash tables and hash functions in the figure.

During the previous tests, we recorded the heap memory size and load factor. We count the heap memory size by implementing a custom Allocator, which counts the allocated bytes during the allocate() function call. However, the accuracy of the counted number depends on the hash table container correctly using Allocator; that is, all memory allocations must go through the allocator. Some hash tables, such as emhash::HashMap7, will not get accurate memory data because Allocator is not fully used for all heap memory allocations.

One of the troublesome aspects of C++ design is that classes that use different Allocator template parameters belong to different types (which is what the std::pmr container is trying to solve). For example, std::basic_string using a std::allocator and std::basic_string using a custom allocator are two types, and hash functions like std::hash are usually only compatible with std::string using std::allocator. Therefore, it is not possible to use these hash functions directly for strings that use other allocators, and our method of counting heap memory size cannot count the heap memory of strings. As a result, we only count cases where both keys and values are integer types.

It needs to be clarified that if the user doesn't care about the total heap memory size but cares about the warm memory size related to the query speed, then the data in this test cannot accurately reflect the cache size that the hash table needs to utilize. Some hash tables require some extra space for non-query work, such as insertion, and this part of the memory space is not accessed during lookup operations.

The set of element counts used in this test is different from other test items. To reflect the ability to cope with different load factors, the number of elements is chosen as 0.4 times or 0.6 times a power of 2, e.g. 0.4 x 2^10, 0.6 x 2^10, 0.4 x 2^11, 0.6 x 2^11, ...

The heap memory a table occupies and the load factor it runs at directly shape its lookup speed, because together they decide how much memory a lookup must stream through and therefore how soon the working set spills out of each cache level. A lower load factor wastes more slots but shortens probe chains; a smaller per-element footprint fits more elements into the same cache. The cache-tier boundaries seen in the lookup throughput test are exactly the points where a table's footprint crosses the L1, L2 and L3 capacities, so the two charts below are the structural explanation behind those tiers.

Because these figures mostly describe the data structures themselves rather than the processor, the heap-memory numbers are mostly CPU-independent. Exact totals can still differ across STL implementations and allocators. The memory-size charts below therefore show the measured Intel and M1 Max values separately; treat them as structure-driven but implementation-sensitive.

Heap Memory Size

Memory size when using default max_load_factor

<K,V>: <uint64_t with several split bits masked, uint64_t>

The footprint splits the tables into three groups. The SwissTable-style absl::flat_hash_map and ska::bytell_hash_map, which spend only one metadata byte per slot, are the most compact — about 272 MB for 10^7 integer pairs. The node-based std::unordered_map and absl::node_hash_map come next (about 308 MB and 297 MB), paying for a per-node allocation and its pointer. The perfect-hash fph::DynamicFphMap and fph::MetaFphMap use roughly twice the most compact tables (about 556-572 MB), the cost of the extra index and metadata that make their lookups fast. Largest of all are ska::flat_hash_map and tsl::robin_map at about 768 MB, because they keep a low maximum load factor and store the full, alignment-padded value type in every slot. emhash::hash_map7 is shown as zero because it does not route all of its allocations through the custom counting allocator, so its memory cannot be measured here (as noted above).

Memory size when using large max_load_factor

<K,V>: <uint64_t with several split bits masked, uint64_t>

Load factor

Load factor when using default max_load_factor

<K,V>: <uint64_t with several split bits masked, uint64_t>

The load-factor chart explains part of that footprint. Most open-addressing tables grow by doubling, so their load factor sawtooths between roughly 0.4 right after a growth and 0.75-0.9 just before the next one; the test samples sizes at 0.4x and 0.6x powers of two, which is why the visible values cluster around 0.5-0.76. ska::flat_hash_map and tsl::robin_map with robin_hood::hash settle to the lowest load factors (about 0.29-0.30 at large sizes), trading memory for shorter probe chains — the same low occupancy that helps their lookup speed. At the other extreme, std::unordered_map runs at a load factor close to 1.0, because a node is allocated per element and there are no empty slots to leave slack; its memory cost lives in the nodes and pointers rather than in spare slots. Raising max_load_factor packs the flat tables more tightly — at 10^7 elements ska::flat_hash_map's load factor rises from about 0.30 to about 0.60, roughly halving its empty space — at the price of longer probe sequences and slower lookups, the tradeoff explored in the lookup tests.

Load factor when using large max_load_factor

<K,V>: <uint64_t with several split bits masked, uint64_t>


← Back to Hash Table Benchmark index