On Delivery

Last fall, we took a look at our deploy process and how other companies were approaching the same problem.

The goals of a successful deploy process entail (graciously borrowing from the Continuous Delivery and web operations books):

reducing cycle time (time to make a change on production)
fewer bugs
fewer complex bugs
lessening MTTD (mean time to detection)
optimizing human resources (more automation)
deploys should be boring, not an exciting event (from the engineering stand point)
make them repeatable, reliable, predictable

Here is my break down of different types of deploy strategies.

1. The Monolith

Quarterly/Yearly+ updates. This is how updates to operating systems work. Consumers don’t want lots of regular updates. There is a giant test matrix and a long test cycle. Finding a regression is PAINFUL. I’ve been there. It sucks.

Application type: OS, “enterprise”

Companies: Apple, Sun(Solaris)/Oracle, IBM

Worst case scenario: we never update the site again before lumos goes under

Best case scenario: who cares, we’re not doing this

2. Jesus On The Dashboard

Push to production “whenever it feels right”. Don’t QA before hand. Don’t dark deploy it. Don’t use feature flags. Users see changes as soon as they are deployed. Cross your fingers and hope it works. If it doesn’t, go into panic fix mode.

Application type: cheap, throw away consumer app

Companies: some facebook apps

Worst case scenario: you break the site regularly, have to hack your sh*t to fix fire alarms, your hair turns grey a lot earlier than it should

Best case scenario: effectively zero cycle time

3. Old Married Couple Controlled Chaos

Check-in and deploy in the same day from trunk.

Iterations are one week and start on, say, Tuesday. Check-in to trunk whenever you want. On day of deploy (Monday), do last minute manual QA and last minute bug fixes/feature creeps. No official process on when you should stop checking-in to trunk on deploy day, but use sensible judgement. Too many last minute bugs/feature creeps can push deploy out another day; however, you’re so used to working with each other that you can usually manage to get the deploy out.

Application type: consumer website with few developers / one feature set per iteration

Companies: early stage start ups

Worst case scenario: one week cycle time, production deploys get delayed

Best case scenario: ~1hr cycle time

4. Grey Beard Continuous Delivery

Code sits in trunk for up to one week, then staging for N days, then into production. We use N = 3.

Iterations are one week and start on Monday. Check-in to trunk if dev CI is green. Staging branch is cut from trunk once a week, every Monday. Only cut staging from trunk if dev CI is green. QA team works on staging branch for up to N days. Only bug fixes can be checked-in to staging branch after cut. Staging has its own CI. Deploys to production from staging branch go out every Wednesday. Production deploy doesn’t happen until QA team says a-ok and staging CI is green. Ensures the code about to go to production is sufficiently QA’d. Limits possibility of feature creep / fresh checkins delaying push to production.

Application type: consumer website with medium to large number of developers / multiple feature sets per iteration

Companies: medium size start ups

Worst case scenario: one week + N days cycle time

Best case scenario: N days cycle time

5. Mountain Dew Extreme Continuous Deployment

Anyone can deploy at any time. Need code review before checking in. Automated test suite prevents bad check-ins. New feature sets are dark deployed. Manual QA is done on production. Metric monitoring alarms when something went wrong (we currently have something like this with splunk).

Application type: consumer website with medium to large number of developers / multiple feature sets going on at same time

Companies: flickr, etsy

Worst case scenario: you break the site but have enough prevention that it shouldn’t matter (ie: rolled back or not immediately user facing)

Best case scenario: however long it takes to get a code review + length of automated test suite (on the order of a few minutes/hours)

Conclusion

Should note that all of these approaches have been used by successful businesses. However, depending on the business/application, some of these will cause the engineers to go crazy, hate their jobs, and start producing lower quality work. Then beat their pets at home (or bikes if they do not have a pet).

Last fall, we moved out of the “Old Married Couple Controlled Chaos” and started using the “Grey Beard Continuous Delivery” deploy process. It was simple, straight forward, and something we could move to with little extra work. Its working well for us and has set a nice cadence for the rest of the company. We are looking into building the tools and experience necessary to get us closer to “Mountain Dew Extreme Continuous Deployment”.

One key tidbit i got from my old ski-lease buddy Paul of flickr fame was that engineers have to have the proper mentality in order to get to true continuous deployment. If you don’t have the mindset that every check-in matters and breaking things is not ok, then process can’t save you. This means you have to be smart about who you hire and bringing new hires up to speed.