Take it easy, trashman! How switching to REE 1.8.7 made lumosity.com way faster

If you’re running a Rails application in production and you’re still running plain old vanilla ruby 1.8.7 (the MRI or “Matz” version), our recent experience shows that you might have a lot to gain for relatively little effort by making the switch to Ruby Enterprise Edition 1.8.7. After rolling out REE and tuning its garbage collection parameters, we saw a 35% improvement in performance – average response time dropped from 200ms to around 125ms.

Since we’re using the amazing RVM in production, switching ruby versions was relatively easy. I wrote a simple Chef cookbook to have RVM install a desired ruby version on an app server, set it as the default, and reinstall essential gems like chef and bundler. The upgrade process then consisted of running chef-client on each server, re-bundling our app’s gems with bundle install, and bouncing the mongrels. (Yes, we’re still using mongrel… for now :)

Right out of the box, REE provided a noticeable improvement in performance, especially in memcache call time:

Note the gradual increase in the brown “GC Execution” layer of the New Relic RPM graph as each app server comes online with REE 1.8.7, which (unlike MRI ruby) emits the stats that New Relic uses to track GC performance. I was somewhat shocked to see that we were spending approximately half of all our Ruby time doing garbage collection, though in retrospect maybe I shouldn’t have been – GC performance tuning is one of the reasons that REE was developed in the first place.

So far so good – we’ve shaved off about 20-30ms on average and we have much better visibility into what Ruby is doing with its time. The real win at this point would seem to be reducing time spent in garbage collection. Time to take Twitter’s advice about tuning GC performance!

This was also a simple change, thanks to monit and chef cookbooks. I added an attribute to our mongrel_rails cookbook to specify the GC parameters to passed into the environment of each mongrel when started by monit – something like this:

check process mongrel_5000
  with pidfile /var/run/mongrel/mongrel.5000.pid every 2 cycles
  start program = "/bin/su - lumoslabs -c 'RUBY_HEAP_MIN_SLOTS=500000 RUBY_HEAP_SLOTS_INCREMENT=250000 RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 RUBY_GC_MALLOC_LIMIT=50000000 /data/lumoslabs/current/bin/rackup -s mongrel -o xx.yy.zz.aa -p 5000 -E production -D -P /var/run/mongrel/mongrel.5000.pid /data/lumoslabs/current/config.ru'"
  stop program = "/bin/su - lumoslabs -c '/data/lumoslabs/current/script/stop_racked_mongrel.sh /var/run/mongrel/mongrel.5000.pid'"
  if totalmem is greater than 600 MB for 2 cycles then restart      # eating up memory?

The actual values for these parameters were taken directly from Twitter’s suggestions. Once I rolled out the monit changes via chef and bounced all the mongrels, I saw good things happen:

This is exactly what an ops engineer wants to see! A nearly immediate and big improvement in performance, and the only code changes required were on the infrastructure side of things. Though I haven’t tested out all of these parameters in isolation (who has the time??!) their aggregate effect is clear. We allocate more memory initially (RUBY_HEAP_MIN_SLOTS increases to 500k from 1k), grab more when we need to add it (RUBY_HEAP_SLOTS_INCREMENT increases to 250k from 10k), and make sure we don’t trigger GC before we actually need it (RUBY_GC_MALLOC_LIMIT increases to 50M from 1M.)

One word of caution - these settings did have the effect of increasing the memory footprint of each mongrel from about 480MB to about 600MB. As a result, subsequently we needed to reduce the number of workers on each server from 8 to 6 to avoid swapping, which after a few days had undone all of the performance improvements made by switching to REE. Having fewer workers is neatly balanced out by the improvement in response time, though, and ultimately we’re seeing lower CPU usage and higher throughput on each app server.

And the magic just doesn’t end. Based on some very solid advice at smartic.us, I added some GC tuning parameters to the environment on our CI box. I was really happy to see that it cut our build time (which had bloated up to become painfully long) approximately in half. Definitely try this out if your build is taking longer than you’d like!

Up next – taking the plunge and upgrading to Ruby 1.9.2. More on that in a month or two!