How to Store Custom Types in a Hash Set

c++11 containers intermediate

std::unordered_set requires that the keys of the set are able to be compared for equality, meaning the keys should support the equality operator (==). The keys of std::unordered_set are also required to be able to be hashable meaning you need to write a specialization of std::hash for the key type. Note that generally we’re not allowed to write code in the std namespace as that’s reserved for the standard library. Hash functions are an exception however and are explicitly allowed for exactly this purpose. Here’s an example:

#include <unordered_set>
#include <string>
#include <iostream>

struct Planet {
  std::string name;

  friend bool operator==(const Planet& a, const Planet& b) {
    return a.name == b.name;
  }
};

// Our custom std::hash specialization for Planet
template <>
struct std::hash<Planet> {
  size_t operator()(const Planet& p) const noexcept {
    return std::hash<std::string>{}(p.name);
  }
};

int main() {
  std::unordered_set<Planet> planets{
    {{"Earth"}},
    {{"Mars"}},
  };

  Planet jupiter{"Jupiter"};
  std::cout << (planets.count(jupiter) ? "True" : "False") << "\n"; 
}
False

Unfortunately C++ does not hold our hand when it comes to hashing. Most of the time objects contain multiple fields and we need to combine individual hashes into a single hash value. We need to implement our own “hash combiner” for this. Here’s an example of a simple hash combiner:

#include <unordered_set>
#include <string>
#include <iostream>

// A simple hash combiner.
// Hashes all args and mixes their hashed bits together 
template <typename ...Args>
size_t hash_all(const Args& ...args) {
  uint64_t hash = 0xc3a5c85c97cb3127ULL; // A large prime number
  // Create a lambda for mixing 64bit hashes
  // https://cppbyexample.com/lambdas.html
  auto hash_mix = [&hash](uint64_t v) {
    hash += v;
    // fmix64 from MurmurHash
    hash ^= hash >> 33;
    hash *= 0xff51afd7ed558ccdULL;
    hash ^= hash >> 33;
    hash *= 0xc4ceb9fe1a85ec53ULL;
    hash ^= hash >> 33;
  };
  // For each arg hash it with std::hash and mix it with hash_mix
  (hash_mix(std::hash<Args>{}(args)), ...);
  return hash;
}

struct Planet {
  std::string name;
  float distanceAU;

  friend bool operator==(const Planet& a, const Planet& b) {
    return a.name == b.name && a.distanceAU == b.distanceAU; 
  }
};

// Our custom std::hash specialization for Planet
template <>
struct std::hash<Planet> {
  size_t operator()(const Planet& p) const noexcept {
    return hash_all(p.name, p.distanceAU);
  }
};

int main() {
  std::unordered_set<Planet> planets{
    {"Venus", 0.7f},
    {"Earth", 1.0f},
    {"Mars", 1.5f},
  };

  Planet mercury{"Mercury", 0.4f};
  std::cout << (planets.count(mercury) ? "True" : "False") << "\n"; 
}
False

It may come as no surprise after the above example that hashing objects is not exactly a trivial topic. Using std::hash and the combiner above is fine for std::unordered_set but they alone aren’t enough to produce cryptographic hashes for passwords or privacy purposes. Truly strong hashes take much more code than can easily fit in a simple example like this. A great resource to learn about other C++ hash functions, and their implementations, is the SMHasher Github project. For our purposes here, just know that the provided hash_all function is sufficient for non-cryptographic hashes such as the ones used for std::unordered_set.


For more C++ By Example, click here.