C++14 relaxed many restrictions on
constexpr functions. The ability to contain branching, loops, switch statements makes them really easy to use. Implementing complicated functions at compile time is now a piece of cake.
To prove this point, I tried and implemented Murmur3A, the Murmur3 hash variant which is optimized for x861 and computes 32 bit hash values. In this post I’m going to walk through the steps of the implementation.
constexpr is relaxed, but not entirely
Lots of constexpr restrictions are removed but there are still some remaining that prevent us from simply sticking constexpr in front of the . First and foremost, let us look at the “main part” of the original function:
const uint32_t * blocks = (const uint32_t *)(data + nblocks*4);
for(int i = -nblocks; i; i++)
uint32_t k1 = getblock32(blocks,i);
k1 *= c1;
k1 = ROTL32(k1,15);
k1 *= c2;
h1 ^= k1;
h1 = ROTL32(h1,13);
h1 = h1*5+0xe6546b64;
The first line contains a C-style cast, which will turn out to be a reinterpret_cast in the end. This is a problem, because reinterpret_casts are explicitly forbidden in constexpr functions. This is, strictly speaking, undefined behavior because a type-punned pointer is being dereferenced and the alignment of uint32_t is not guaranteed to be as this code assumes. Nevertheless, the code should work on its target platform (x86). This line being undefined behavior alone would kick us out of constexpr-world if the cast was not explicitly forbidden already. So how do we get around this?
Shifting good, casting bad
As a side note, the following solution still relies on undefined behavior, but so does the original implementation.
Getting the blocks
So, to get around the above limitation, I created a simple constexpr string_view class that can synthesize the 32-bit blocks from the compile-time string. This approach can be reused in similar scenarios, too.
The get_block function takes four bytes of the string, shifts and puts them together. The blocks are returned in reverse order as that is how the algorithm uses them (i.e. the algorithm indexes starting with negative numbers up to zero).
Murmur3 uses the remaining bytes in a specific way if the input length is not a multiple of 4:
As you can see in the implementation of str_view, the tail bytes are computed similarly to the blocks.
The remaining modifications to the canonical implementation are trivial: I inlined the rotl32 calls and instead of an explicit length parameter I use the size function of str_view. I also changed the void* output parameter to uint32_t. So the full implementation looks like this:
I originally experimented with indexing similarly to the blocks (and adding a number to get the nth tail byte), but that turned out to break the constexpr-ness – I’m not entirely sure if that was a compiler error or letting this compile is the one. Which brings us to two important questions.
Is this really a compile-time constant?
Yes. The following code would not compile otherwise:
Is this code well-formed according to the standard?
To be honest, I’m not entirely sure. I am not using any reinterpret_casts which would be the main culprit in implementing hash functions that operate on blocks of data. The code does compile with both clang and gcc. Feel free to comment if you have any input on the issue.
Getting the code
The entire source code and a couple of unit tests are available on Github.
MurmurHash in C++11 constexpr
I’ve gotten a trackback from Ben Deane’s blog where he implements Murmur3 without the niceties of the C++14 rules.
Not that this matters for constexpr; the point is, it computes the same values as that version of the algorithm does. ↩