Replacing std::map with std::unordered_map

One of the items that came up at the last Online Developer Meeting was basing more of Octave internals on C++ libraries, rather than our own hand-rolled code. This should get us 1) more cross-platform compatibility, 2) less code of our own to maintain with a limited number of volunteers, and 3) better performance as library code has typically been optimized.

In that regard, I suggested a useful activity might be to replace instances of std::map with std::unordered_map in order to move from O(log2(N)) performance to O(1) performance.

I’m happy to report that I made this change while rewriting gsvd.cc and it was just as simple as I had hoped. All that was required was to replace

#include <map>

with

#include <unordered_map>

and then instances of

std::map<XXX,YYY>

with

std::unordered_map<XXX,YYY>

I then re-built Octave and that was it—the compiler and linker took care of everything and we should get better performance.

I would recommend the same strategy to anyone occasionally looking through the C++ code. If they find an instance of std::map in which we aren’t doing much iterating over the keys, then this is a good chance to replace it with std::unordered_map. In particular, I’m guessing that our caching mechanisms, such as load_path, might benefit. In caches, the frequent operations are to add keys, search keys, and delete keys, but not to list all the keys and loop over them.

3 Likes

Thanks for looking into this.

I would bet that performance is generally better than O(log(N)) but it might be interesting to look at what std::unordered_map::load_factor returns for some of the largest and most frequently used map objects to get an idea about how close to O(1) we really are for those objects.

In the std::map case I don’t know why it would be better than this. The STL is pretty clear that the complexity of the lookup is logarithmic. Documentation, for example, at map::operator[] - C++ Reference

Documentation for unordered_map says the access time is, on average, constant and worst case linear. Linear would happen if the output of the hash function put everything in to the same bucket. See, for example, unordered_map::operator[] - C++ Reference.

I meant that I would expect std::unordered_map performance to generally be better than the expected O(log(N)) performance of std::map, but it is possible for it to be worse, so I was just wondering whether it would be easy to confirm that and to see how much better it really is.

Unfortunately, load_factor() just returns number_of_elements / number of buckets. That just gives one an average idea of the number of collisions expected, rather than how many collisions actually occur. Given that max_load_factor defaults to 1.0, there are generally N buckets for N elements.

To really check, something like this would be required

 unsigned n = mymap.bucket_count();

  for (unsigned i=0; i<n; ++i) {
    std::cout << "bucket #" << i << " contains: ";
    int elemcnt = 0;
    for (auto it = mymap.begin(i); it!=mymap.end(i); ++it)
      elemcnt++;
    std::cout << elemcnt << "\n";
  }