Haskell at JanRain

We recently put some haskell code into projection on our RPX platform. Since production haskell code is not something you see all over the place it’s worth of blog post to explain the why? & how?.

A little history. One of the many projects JanRain is involved in, is the opensource OpenID libraries written in Python, Ruby & PHP. As a side note, will shortly be making a change in how our libraries are maintained (more on this soon). They are pretty decent libraries and have a high rate of deployment with sites running their own OpenID code. The first priority of the libraries has always been correctness, witness the boatload of testcases that the libraries ship with. The second priority has deployment flexibility. The libs work on a wide variety of OS’s, runtimes, databases, frameworks. The third priority has been to provide an abstract where the developer doesn’t need to know the gory details of the protocol.

Way down on the list of priorities (if even on the list) was performance. It’s really never been an issue. We use the our own libraries with myOpenID.com, and sure it’s yahoo or google, but it gets some traffic. And on the RP side we have jyte & pibb. They used to be the largest RP’s out there, and we never saw performance of the OpenID piece become an issue.

Enter RPX. For the sake of this story let’s just consider RPX is essentially OpenID & OAUTH library functionality up in the cloud (it actually does alot more, check it out at RPXnow.com). Since we launched RPX, our transaction traffic has grown by leaps and bounds. The growth is due not only to additional sites using RPX that come online, but returning users traffic has it’s own growth rate. It became clear that our out-of-the box libraries were not geared for massive traffic. A variety of solutions were debated, prototyped, combined. Do we use threads?, move to an async loop?, put this section in C? write the whole thing in C? hack around the GIL?

Enter Haskell. Let’s start that haskell is more than a bit weird at first take, even for an old software guy like me. Haskell has been around for a while, but has only recently started to find a flourishing community. At JanRain we’ve known about haskell for quite sometime. Internally we use darcs which is written haskell. Haskell allows us to write highly parallel-able, asynchronous code (lazy evaluation), that works, is easy to maintain/test, and runs at near C speed. I think that sums it up :).

The Results. Benchmarks are benchmarks, non-the-less, it’s what we have. What we measured was our haskell code doing over 100X throughput vs our legacy code. I’ll say it again, because it sounds good. Over 100x increase in throughput.

Why do this? Hardware is cheap, just buy more boxes, with more cpu’s, with more RAM. True, hardware is cheap, but let 100X reduction sink in. While the pure cost of hardware matters, it’s more in the reduction of boxes you have to think about. With a 100X reduction (and who knows, maybe we only get 50X or 25X in the real world) overbuilding for capacity and keeping full hot redundancy because much more manageable. Just imagine cutting a release on 1 box vs 50, and then doing that several times a day.

Will we do more haskell? Absolutely. For a certain class of backend problem it will now be the language of choice here at JanRain. As a side benefit, haskell attracts killer engineers.