Comprehensive C++ Hashmap Benchmarks (2022) (martin.ankerl.com)

by klaussilveira 20 comments 58 points
Read article View on HN

20 comments

[−] spacechild1 46d ago
Note that this benchmark does not include boost::unordered_flat_map. This is an open addressing variant of boost::unordered_map which has only been released in December 2022.

I wanted to mention this because boost::unordered_flat_map and boost::unordered_flat_set are among the fastest open addressing hash containers in C++ land. Internally, they use lots of cool SIMD tricks. If anyone is interested in the details, here's a nice blog post by the developer: https://bannalia.blogspot.com/2022/11/inside-boostunorderedf...

[−] loeg 46d ago
The F14/Abseil maps are included and use cute SIMD tricks, FWIW (discussed by the blogspot author).
[−] jandrewrogers 46d ago
Benchmarks are fine but they will only be loosely correlated with the measured performance for any specific use case.

There is still substantial performance to be gained by creating bespoke hashmap designs at every point of use in code. The high dimensionality of the algorithm optimization space makes it improbable that any specific hashmap algorithm implementation will optimally capture the characteristics of a use case or set of use cases. The variance can be relatively high.

It isn't uncommon to find several independent hashmap designs inside performance-engineered code bases. The sensitivity to small details makes it difficult to build excellent hashmap abstractions with broad scope.

[−] jeffbee 46d ago
It's also the case that performance of a hashmap, or anything, in a small-scale benchmark may not reflect the performance in a large system that does things other than manage maps. There are side effects like how many icache lines are visited during a map operation. These don't hurt microbenchmarks but they can matter in real systems. It may not be completely pointless to microbenchmark a data structure but it can be ultimately misleading.
[−] menaerus 46d ago
Totally. It's funny how many people do not actually realize this and get stuck in cargo-cult mindset forever, no matter the years of experience.
[−] yxhuvud 46d ago
Also it is very easy to find situations where hash table lookups dominate a measurement, but where the answer isn't to make the lookup faster but to make fewer lookups.
[−] jeffbee 46d ago
"Can we simply not?" is the best optimization.
[−] anematode 46d ago
Definitely. As an extreme but fun example... in one project I had a massive hash map (~700 GB or so) that was concurrently read to/written from by 256 threads. The entries themselves were only 16 bytes and so I could use atomic cmpxchg, but the problem I hit was that even with 1GB huge pages, I was running out of dTLB entries. So I assigned each thread to a subregion of the hash table, then used channels between each pair of threads to handle the reads and writes (and restructured the program a bit to allow this). Since the dTLB budget is per core, this allowed me to get essentially 0 dTLB misses, and ultimately sped up the program by ~2x
[−] senderista 46d ago
The "delegation pattern" for datastructures:

https://timharris.uk/papers/2013-opodis.pdf

[−] anematode 46d ago
ah! I thought I was being original :)
[−] zX41ZdbW 46d ago
The performance of hash tables and hash functions significantly depends on the data distribution, and should be compared on real datasets.

I've covered it in my presentation: https://presentations.clickhouse.com/2017-hash_tables/

[−] stephc_int13 46d ago
"Each benchmark was run multiple times, and I’m using the median to get rid of any potential outliers."

This is not how you should do benchmarks. Don't take the median, you don't even need to do any "warming up".

Simply run it long enough and only take the best result of each. This is more reliable and correct.

[−] syncsynchalt 46d ago
This is not universally applicable, especially if an algo isn't deterministic. For example if you were to time "bogosort of 100 items" you'd see increasingly better times the more runs you performed.
[−] hermitcrab 46d ago
Would be interested to hear how the Qt QHash compares.

https://doc.qt.io/qt-6/qhash.html

[−] rurban 46d ago
Still using linked lists as std::unordered_map. So it won't fly, but keeps ptr stability.
[−] hermitcrab 46d ago
Didn't all the QList<> linked lists become QVector<> is Qt 6?
[−] ranger_danger 46d ago
It's a bit confusing. QVector in Qt 6 is an alias to QList, which has now been changed to work like QVector did.

https://www.youtube.com/watch?v=nSFIFlLpsTQ

[−] rurban 46d ago
Not really comprehesive. Doesn't include my favorite https://github.com/greg7mdp/parallel-hashmap which adds thread-safety to performance.
[−] aw1621107 46d ago
For what it's worth, there's this bit from the parallel-hashmap readme:

> We encourage phmap users to switch to gtl if possible. gtl provides the same functionality as this repository, but requires C++20 or above.

And the benchmarks do include gtl.

[−] rurban 46d ago
Thanks, didn't knew that.