From Python to Go: migrating our entire API

The tl;dr Summary

If you want to know the whole story, read on. But for the impatient out there, here’s the executive summary:

  • We migrated our entire API stack from Python (First Django then Falcon) to Go, reducing the mean response time of an API call from 100ms to 10ms
  • We reduced the number of EC2 instances required by 85%
  • Because Go compiles to a single static binary and because Go 1.5 makes cross compilation a breeze, we can now ship a self-hosted version of Repustate that is identical to the one we host for customers. (Previously we shipped virtual machine appliances to customers which was a support nightmare)
  • Due to the similarity between Python and Go, we were able to quickly re-purpose our unit tests written in nose to fit the structure that Go requires with a just a few simple sed scripts.

Background

Repustate provides text analytics services to small business, large enterprises and government organizations the world over. As the company has grown, so too has the strain on our servers. We process anywhere from 500 million to 1 billion pieces of text EACH day. Text comes in the form of tweets, news articles, blog comments, customer feedback forms and anything else our customers send our way. This text can be in one of the 9 languages we support so there’s that to consider as well since some languages tend to be more verbose than others (ahem, Arabic).

Text analytics is tough to do at scale since you can’t really leverage caching as much as you could in, say, serving static content on the web. Seldom do we analyze the exact same piece of text twice so we don’t bother maintaining any caches – which means each and every request we get is purely dynamic.

But the key insight when analyzing text is you realize that much of it can be done in parallel. Consider the task of running text through a part of speech tagger. For the most part, part of speech tagging algorithms use some sort of probabilistic modelling to determine the most likely tag for a word. But these probability models don’t cross sentence boundaries; the grammatical structure of one sentence doesn’t affect another. This means given a large block of text, we can split it up into sentences and then do the analysis of each sentence in parallel. The same strategy can be employed for sentiment as well.

So what’s wrong with Python?

Our first version of our API was in Django because, well, everyone knew Django and our site runs on Django so why not. And it worked. We got a prototype up and running and then built on top of that. We were able to get a profitable business up and running just on Django (and an old version at that, 1.3 was the version we were using when even 1.6 was out!).

But there’s a lot of overhead to each Django request/response cycle. As our API grew in usage, so too did reliability issues and our Amazon bill. We decided to look at other Python alternatives and Flask came up. It’s lightweight and almost ready-made for APIs, but then came across Falcon. We liked Falcon because it was optimized right off the bat using Cython. Simple benchmarks showed that it was *much* faster than Django and we liked how it enforced clean REST principles. As a bonus, our existing tests could be ported over quite easily, so we didn’t lose any time there.

Falcon proved to be a great stop gap. Our mean response time fell and the number of outages and support issues fell, too. I’d recommend Falcon to anyone building an API in Python today.

The performance, while better than Django, still couldn’t keep up with our demand. In particular, Python is a world of pain for doing concurrency. We were on python2.7 so we didn’t check out the new asyncio package in python3, but even then, you still have the GIL to worry about. Also, Falcon still didn’t solve one other major pain point: self-hosted deployment.

Python does not lend itself to being packaged up neatly, like Java or C, and distributed. Many of our customers run Repustate within their own networks for privacy & security reasons. Up to this point, we’ve been deploying our entire stack as a virtual appliance that can work with either VMware or Virtual Box. And this was an OK solution, but it was clunky. Updates were a pain, support was a pain (“how do I know the IP address of my virtual machine?”) and so on. If we could provide Repustate as a single, installable binary that was the exact same code base as our public API, then we’d have the best of both worlds. Also, this ideal solution had to be even faster than our Python version in Falcon, which meant leveraging the idea that text analytics lends itself to concurrent processing.

Go get gopher

Taking a step back in our story – our Arabic engine was written in this fancy new (at the time) language called Go. Here’s the blog post where we talk about our experience in migrating the code base to Go, but suffice to say, were quite happy with it. The ideal solution was staring us right in the face – we had to port everything to Go.

Go met all of our criteria:

  • faster than Python
  • compiles to a single binary
  • could be deployed into any operating system (and since Go 1.5, very easily at that)
  • makes concurrency trivial to reason about

As an added bonus, the layout of a Go test suite looks pretty similar to our nose tests. Test function headers were simple enough to migrate over e.g.:

def test_my_function():

becomes this:

func TestMyFunction(t *testing.T) {

With a couple of replacements of “=” to “:=” and single quotes to double quotes, we had Go-ready tests.

Because go routines and channels are so easy to work with, we were able to finally realize our dream of analyzing text in parallel. On a beefy machine with say 16 cores, we could just blast our way through text by chunking large pieces of text into smaller ones and then reconstituting the results on the other end e.g.

    chunks := s.Chunks(tws)
    channel := make(chan *ChunkScoreResult, len(chunks))
    for _, chunk := range chunks {
        go s.ScoreChunk(chunk, custom, channel)
    }   

    // Now loop until all goroutines have finished.
    chunkScoreResults := make([]*ChunkScoreResult, len(chunks))
    var r *ChunkScoreResult
    for i := 0; i < len(chunks); i++ {
        r = <-channel
        chunkScoreResults[i] = r 
    }

This code snippet shows us taking a slice of chunks of text, “Scoring” them using go routines, and then collecting the results by reading from the channel one by one. Each ChunkScoreResult contains an “order” attribute which allows us to re-order things once we’re done. Pretty simple.

The entire port took about 3 months and resulted in several improvements unrelated to performance as the team was required to go through the Python code again and make improvements. As an aside, it’s always a good idea, time permitting, to go back and look as some of your old code. You’d be surprised at how bad it could be. The old “what the heck was I thinking when I wrote this” sentiment was felt by all.

We now have one code base for all of our customers that compiles to a single binary. No more virtual appliances. Our deployment process is just a matter of downloading the latest version of our binary.

Concluding remarks

The one thing writing code in a language like Go does to you is make you very aware of how memory works. Writing software in languages like Python or Ruby often seduces you into being ignorant of what’s going on under the hood because it’s just so easy to do pretty complex things, but languages like Go and C don’t hide that. So if you’re not used to that way of thinking, it takes some getting used to (how will the memory be allocated? Am I creating too much garbage? When does the garbage collector kick in?) but it makes your software run that much more smoothly and to be honest, makes you a better Python programmer, too.

Go isn’t perfect and there’s no shortage of blogs out there that can point out what’s wrong with the language. But if you write Go as it is intended to be written, and leverage its strengths, the results are fantastic.