Whitfin's Blog: Writing SipHash: my introduction to NIFs in Elixir

After recently discovering SipHash when searching for a solid algorithm to use in another project, I decided to take a shot at writing an implementation in Elixir as an exercise (having only started picking up Elixir in the last few weeks or so). The project is available on GitHub and is also hanging around on the Hex package manager.

The Problem

My initial implementation was functionally correct, and worked pretty well. It was averaging around 60 µs/op - not fantastic, but not horrible. After a little refactoring, I managed to shave this down to ~20 µs/op. It was here that I hit a performance wall; there was seemingly nothing obvious I could do to speed up the process (without sacrificing some safety nets, such as guard statements which were being too picky).

As it turns out, some profiling showed that one of the biggest things impacting performance was the application of a 64-bit mask to numbers throughout the translation. Elixir numbers have 64-bit double precision, and so numbers were overflowing the values expected by the SipHash algorithm. This meant that during every round of compression, there were ~10 masks applied. Keep in mind that a round of compression is per 8-byte chunk, making the number of masks applied (per hash) something like:

m = ceil(b / 8) * ((c * e) + (d * e))

b = the number of bytes hashed
c = the rounds of compression
d = the final rounds of compression
e = the number of masks per round
m = number of masks applied

This means that for the input "hello", there are 60 masks applied, which is expensive even with the fastest masking implementation possible. Clearly, this needed solving.

The Solution

Unfortunately, I was already tapped out on speed in the Elixir layer. Application of a mask is as straightforward as n &&& 0xFFFFFFFFFFFFFFFF, so there was little room for improvement. So, after a quick visit to StackOverflow, a helpful comment pointed me in the direction of NIFs.

I had previously looked at NIFs, but hadn't really had a use case. Sure, it's cool to make things fast, but they come at the cost of writing both C and Elixir inside a project. NIFs are functions implemented in native code, which you can embed into your application and call at-will. Sounds great, right? Well that's because they are.

I soon migrated the main SIPROUND algorithm across to C, and added a NIF loader into my module to override the Elixir implementation. I immediately started seeing better results, roughly ~3 µs/op where there had previously been ~20 µs/op. All with zero behaviour change, just a (somewhat) minor implementation change.

This specific speedup is not only a result of C being generally faster than Elixir/Erlang, but also the fact that C is already treating my values in the precision I need, which obliterates the need to apply the masks in the first place (thus removing my bottleneck). Boom, problem solved.

Be Careful

NIFs aren't perfect though; they're relatively unsafe, in that a crash in a NIF will bring down the Erlang VM. Due to this, you should keep any native components as simple as you can, or at least keep them as bulletproof as you can. A technique I have been following recently is to implement a wrapper in the Elixir layer using all necessary guards to ensure you get the input you expect, before passing off to the native layer. This reduces the complexity on the C side, making it easier to tailor your functions to a specific use case.

You should also keep a base implementation available, in case something goes wrong when an end-user is trying to use your application. If someone tries to use my SipHash implementation but for some reason their C compiler is out of action, Elixir/Erlang is smart enough to fall back to using the Elixir version. Of course, it's perfectly valid for your Elixir implementation to simply emit a log message saying that something is broken (which is certainly less effort), but I took the other route of keeping the same behaviour as the C code, just in Elixir. This version may be slower, but at least it keeps the library in a working state.