Sleep easy when switching MySQL servers: benchmarking with sysbench and mk-query-digest

Our MySQL database servers are the heart and soul of lumosity.com’s Rails stack. Memcached helps a ton with serving up data that’s accessed very frequently, but in the end the database is the single most critical (and most complex) piece of our infrastructure.

This summer we decided that it was time to go through the semi-painful process of upgrading our trusty primary database server – which has been serving a thousand queries per second for the last 18 months without batting an eyelash – to a bigger, badder, more modern machine with larger and faster disks, more RAM, and faster processors. Like most engineering teams, we strongly prefer to add capacity before we encounter performance issues, so it was the right time to switch. We needed to prepare the site for our next 6+ months of growth.

Once the new machine was prepped and ready to go, we had an important question to answer:

“This machine certainly seems fast, but can we be sure that it can handle our production load? What if there’s a subtle issue with the RAID controller, or maybe an unexpected I/O issue with our MySQL configuration? What if we blow up the site???!!!”

We weren’t comfortable simply crossing our fingers and making the switch. With money and subscriber satisfaction on the line, we needed to be sure that it would be a smooth transition.

Sysbench - OLTP Workload

Baron Schwartz et al’s indispensable High Performance MySQL, 2nd Edition (hereafter referred to as HPM) has a fairly basic but good section on benchmarking MySQL. sysbench is one of the standard Linux benchmarking tools covered in his survey. I wanted a simple tool that could (1) find the upper bounds of I/O and transaction processing performance on the new database server and (2) allow us to compare these boundaries against the performance of our current database servers.

After installing sysbench using yum, I followed Baron’s lead in HPM and fired off a fairly sizable OLTP workload on each of the machines. This would provide a reasonable approximation of the I/O generated by a database server handling many concurrent requests in a Rails stack. The goal of this test was to determine the expected throughput (transactions per second) and per-request processing time that both generations of server could handle at their peak. My hypothesis was that db-new, all souped up with 2011 hardware, would smoke db-old and its 2009 hardware.

I executed the following commands (straight out of HPM) on db-new and db-old to push 60 seconds worth of requests through a 1M row table in 8 concurrent threads:

# sysbench --test=oltp --oltp-table-size=1000000 \
    --mysql-db=test --mysql-user=root prepare
# sysbench --test=oltp --db-driver=mysql --oltp-table-size=1000000 \
    --mysql-socket=/tmp/mysql.sock --mysql-db=test --mysql-user=root \
    --mysql-password=xxxxxxx --max-time=60 --oltp-read-only=on \
    --max-requests=0 --num-threads=8 run

The results from db-new:

OLTP test statistics:
queries performed:
read: 3925586
write: 0
other: 560798
total: 4486384
transactions: 280399 (4673.18 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 3925586 (65424.48 per sec.)
other operations: 560798 (9346.35 per sec.)

Test execution summary:
total time: 60.0018s
total number of events: 280399
total time taken by event execution: 478.7724
per-request statistics:
min: 1.45ms
avg: 1.71ms
max: 6.62ms
approx. 95 percentile: 1.84ms

Threads fairness:
events (avg/stddev): 35049.8750/715.27
execution time (avg/stddev): 59.8465/0.01

And from db-old:

OLTP test statistics:
queries performed:
read: 2671536
write: 0
other: 381648
total: 3053184
transactions: 190824 (3180.28 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 2671536 (44523.87 per sec.)
other operations: 381648 (6360.55 per sec.)

Test execution summary:
total time: 60.0023s
total number of events: 190824
total time taken by event execution: 478.7136
per-request statistics:
min: 2.14ms
avg: 2.51ms
max: 81.88ms
approx. 95 percentile: 2.71ms

Threads fairness:
events (avg/stddev): 23853.0000/177.37
execution time (avg/stddev): 59.8392/0.00

This was good news, but not unexpected – the 95th percentile request time is approximately 33% faster on db-new (1.84ms vs 2.71ms), and the throughput is about 150% higher (4673 transaction/sec vs 3180 transactions/sec.) Given the increase in CPU and I/O horsepower on db-new, I would have been disappointed with anything else!

Before I could jump headfirst into the void of db-new, I had to see it run our real production workload. I had to make sure that it didn’t blow up because of some subtle change to MySQL configuration, or the new RAID controller, or the RAM, or one of an infinite number of other things. This was a head-scratcher at first; I thought maybe we could sniff the mysql traffic from db-old with tcpdump and somehow “replay” it on db-new, but I hadn’t a clue how to decode the mysql protocol. It sounded like a lot of work and we were working against the clock, disk filling up little by little on db-old every day. But soon enough, Google led me to…

mk-query-digest (the greatest MySQL tool in the history of the universe)

I had already discovered maatkit via the mentions of mk-table-sync in HPM, but had never bumped into the innocuously-named mk-query-digest until I found this 37signal post on warming the passive failover in a master-master replication pair. mk-query-digest should be in every MySQL admin’s toolkit: it does pretty much everything, including stuff you had no idea you needed to do. Let it teach you.

I already had db-new set up as a replication slave of db-old, so I knew their data was in sync at any given time (give or take a few seconds.) Moreover, I knew that UPDATEs and INSERTs were working as expected on db-new, since db-new was constantly replaying those queries by reading db-old’s binary logs via replication. But I had no proof that db-new would be able to keep up with the sizable throughput of SELECTs on db-old, about 1000-1200 per second these days. As outlined in the 37signals post I linked above, the quick-and-dirty way to do this was to run tcpdump on db-old to capture mysql traffic for some period of time, say 5 minutes, and then to use mk-query-digest to replay the SELECT queries on db-new.

First, I ran tcpdump on db-old:

[db-old] # time tcpdump -s 65535 -x -nn -q -tttt -i any port 3306 > db-old.tcp.txt
tcpdump: WARNING: Promiscuous mode not supported on the "any" device
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
1050296 packets captured
1063883 packets received by filter
13498 packets dropped by kernel

real 4m34.464s
user 1m40.576s
sys 0m16.530s

Then, I carefully shipped the resulting 1.8GB file over to a third server, call it db-other, which would be used to execute the queries on db-new. Why a third server? I didn’t want the overhead of the mk-query-digest perl script itself polluting the results on db-new. Also, note the use of the '-l 500' argument to scp, which rate-limits the file copy to 500kB/sec. Since db-old was a live production database server, I had to take care not to hog all its outbound bandwith with the file copy, which would starve our Rails app servers of data and crash the site!

[db-old] # scp -l 500 db-old.tcp.txt lumoslabs@db-other:~

Now, on db-other, I ran mk-query-digest to run the 5 minutes worth of queries on db-new. I’m using the --filter argument to pass in a perl expression that will only execute queries that start with the string ‘SELECT’ (case insensitive.)

[db-other] # mk-query-digest --type tcpdump \
    --filter '($event->{arg} =~ m/^SELECT/i) \
    --execute h=db-new,u=user,p=xxxxxx db-old.tcp.txt

On db-new, I fired up mk-query-digest to watch the SELECTs roll in on the mysql interface:

[db-new] tcpdump -s 65535 -x -nn -q -tttt \
    -i any port 3306 | mk-query-digest --print --type tcpdump

# Time: 110913 15:48:15.650359
# Client: 10.32.95.138:48155
# Thread_id: 4294967311
# Query_time: 0.000059 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
SELECT `asset_versions`.* FROM `asset_versions` WHERE (`asset_versions`.asset_id = 27668);
# Time: 110913 15:48:15.652158
# Client: 10.26.2.134:53512
# Thread_id: 4294967346
# Query_time: 0.000056 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
SELECT COUNT(*) FROM `roles` INNER JOIN `roles_users` ON `roles`.id = `roles_users`.role_id
  WHERE `roles`.`name` = 'admin' AND ((`roles_users`.user_id = NULL));

... etc ...

I watched top, iostat, and mytop to make sure nothing was blowing up. The server load stayed nice and moderate, peaking at 0.66 as the iowait percentage spiked during the initial paging into the InnoDB buffer. It eventually settled into a comfortable 0.33, with iostat showing only 0.1% iowait time. Basically, things looked great from the perspective of system metrics. I relaxed even more!

The output of mk-query-digest gave even more reason to be hopeful that db-new was ship-shape:

# 339.9s user time, 7.1s system time, 112.13M rss, 233.60M vsz
# Current date: Mon Sep 5 14:57:11 2011
# Hostname: db-other.sl.lumoslabs.com
# Files: db-old.tcp.txt
# Overall: 365.64k total, 1.89k unique, 1.33k QPS, 18.37x concurrency ____
# Time range: 2011-09-05 13:18:10.837646 to 13:22:45.293365
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 5042s 0 38s 14ms 839us 347ms 194us
# Exec orig ti 204s 0 2s 557us 596us 11ms 131us
# Rows affecte 21.92k 0 19 0.06 0.99 0.24 0
# Query size 42.85M 5 3.61k 122.87 246.02 148.13 102.22
# Exec diff ti 273s 0 38s 2ms 626us 118ms 108us
# Warning coun 11.81k 0 11.30k 0.03 0 18.31 0
# Boolean:
# No index use 7% yes, 92% no

We can see that we reach 1.33k queries per second, which is in line with the expected load of our production traffic. We can also compare the 95th percentile execution time of the queries on the new server – “Exec time” of 839 microseconds – with that on the old server – “Exec orig time” of 596 microseconds. Given the fact that we were running these queries against an entirely “cold” server, i.e. with absolutely no data in the InnoDB buffer, we would expect this performance hit. Nearly every piece of data requested by the SELECTs during this 5 minutes had to be pulled from disk, whereas db-old had the great advantage of having many GB of RAM all warmed up with the most-frequently requested items.

So, this particular test shows us that the database server wasn’t crushed by the load, but not that its performance was comparable to the original machine. To test that we’d need to mirror our production workload for a considerable period of time – an exercise that is beyond the scope of this post. (But we did it, of course!)

Conclusion

This database switchover went smooth, as smooth as Sade. We’re running against db-new now and have started seeing the expected performance boost as its InnoDB buffer pages in the optimal working set for Lumosity’s data. It’ll be very useful to have these tools in our box in the near future when we tackle our next database project: upgrading to MySQL 5.5. Without benchmarking, we’d just be switching and praying!