How We Made Deploys Less Scary | Datadog

How We Made Deploys Less Scary

Published: 4月 16, 2019

All right.

Yes, so I wanted to talk to you today a little bit about our deploy process and how we use feature flags.

Introduction and an anecdote

But first a story.

Once upon a time, there was a programmer who wrote a million lines of code.

He worked for weeks, apparently rolling a boulder up a hill.

He stayed up all night, drank a lot of coffee, enduring brutal code reviews.

Until one day, he finally prevailed.

The code was merged.

He waited patiently for the release and then it was live.

I bet you can guess what’s coming next.

And then the whole site went down and it was a train wreck.

So my name is Adam Savitzky, and this is my story.

I am a software engineer and I have done that before.

And if you’re like me, you might have done it as well.

And it happens more commonly than we’d like to admit.

It’s not always my fault either.

What is Carta?

But first, I wanted to talk a little bit about where I work.

I work at a company called Carta.

Carta is a platform that allows employees, founders, and investors to manage equity.

Our mission is to create more owners in the world and help everybody get involved and be a part of wealth creation.

So many of you may be familiar with Carta, if you work at a venture-backed startup, there’s a good chance that you use Carta, or maybe not and that’s fine too.

Some quick facts.

About half a trillion dollars of equity is managed through Carta’s platform.

There’s over 10,000 companies and 700,000 shareholders on the platform.

We’re growing very quickly, and we’re hiring.

And I hear that SOEs get paid very well.

So you can come find me if you’re interested.

What are feature flags?

So what’s a feature flag and why should you care?

Maybe you know what it is, but you don’t care, I don’t know, some combination.

So one way to think about it is as a circuit breaker for your code.

So a circuit breaker is basically there to prevent fires, right?

So if you have too much current going through a wire, you want a quick switch that’s gonna turn off and prevent everything from burning to the ground.

A feature flag is like that for your code.

It’s a way that you can go in and just instantly disable whole parts of your code base, or enable them if you want to, without having to do a hot fix.

When I joined Carta, we only shipped, well, my team anyway, only shipped once a week, and I wanted to be able to change that.

So we looked at ways that we could ship more frequently for example, and one of the ways that we figured we could do it is by introducing feature flags.

If every time we ship, there’s like a 25% chance that we have to do a hot fix, well, you’re not gonna wanna ship as often.

So this was intended to help us impprove that problem.

Another way to think about feature flags is as a mechanism for dark launching.

So what’s a dark launch?

Basically, a dark launch is releasing your feature, but not having it actually be enabled.

So in this case, you might release the rocket ship feature, but the users don’t actually see it because it’s disabled.

Instead, they see the bicycle.

How can feature flags help improve reliability?

So why should you be using feature flags?

Maybe you already are, but if you’re not, here’s some reasons why.

So first of all, the release process, especially for an engineer, probably for SOEs, too, is one of the most stressful things that we do.

You don’t wanna be worried about it every time you ship.

So by having a feature flag, you can instantly roll back broken code without having to do a hot fix.

Smaller Diffs, so this is something that might be a little bit less intuitive, but prior to having feature flags, we spent weeks, and weeks, and weeks developing a new feature.

And then when it was finally time to merge, there’d be thousands and thousands of lines of code, nobody can review that.

And when you do merge it, it’s likely to be error prone.

So by centering our release process around feature flags, you can ship things that are very incremental.

So you start with just the core, and then you add little things bit by bit.

And because the feature is disabled, nobody ever knows it’s there.

But it stops you from having giant diffs when you actually end up merging.

Feature flags help maintain agile development cycles

And then there’s the whole agile process.

So you might not be doing agile wrong, but this is kind of a catchy title, so that’s why I put it there.

So one way to think about agile is you have different cycles.

So the smallest one, and the one that I’m most familiar with is the development cycle, which is where I write code, I test it, I push it to GitHub, it gets merged.

Then you have your release cycle, which might span multiple development cycles.

And that’s when you actually push code to production.

And then finally, you have your feature cycle, which is a longer term thing that takes multiple releases, perhaps, it could span months as you build out a complicated feature, like, synthetics, right?

So these are all important cycles, but you want them to be able to interact independently from one another.

So my feature cycle shouldn’t be blocking my development cycle and my development cycle shouldn’t be blocking my release cycle.

How Carta built its own feature flag backend

By using feature flags, you can keep all of these things decoupled from one another and maintain agile.

So briefly, I wanted to talk how we use feature flags at Carta.

We looked around, we looked at a lot of different libraries for doing feature flags, some open source, some commercial, we didn’t find any that we liked.

We did find, there’s one called LaunchDarkly which is available which is very good, but it’s also somewhat expensive. And I’d probably rather walk over hot coals than go through the vendor approval process.

So we built our own.

So Carta’s primary web application is just a Django monolith.

So we have a client that we install as a Python package that lives in the same repository as the Django monolith.

And it basically just proxies to a Redis, which we use as a backend.

We didn’t wanna have to run any special software or like a feature flag service or anything like that, so we designed it with portability in mind.

So if you have Redis, or a couple of the other backends that I’m gonna show here, you can use it right away, and it’s very easy to add backends too.

So if you wanted to add one for Postgres, for example, that would be pretty straightforward.

We’re actually using a combination of both Redis and S3 because we need a durable form of storage too.

So when we write or when we create a feature flag or modify it, we actually save it to both places, and we use Redis in production because it has latency properties, it can be very slow to go out and read from S3 all the time. But we do wanna make sure that we have a backup.

Ideally, we’d be using something like Consul, because it’s going to actually push the changes to your host directly.

So you don’t have to go fetch them and do a round trip every time.

We just don’t run Consul on our infrastructure right now.

You could just as easily build the backend for Zookeeper too if you prefer to use that.

Basically, anything can be used as long as you can store JSON there.

Conditions and considerations for feature flags

So this is what a feature flag looks like, it stores the name, its status, and a bunch of metadata.

This client data is interesting too, because you can use it to store any custom stuff that you want.

So if there’s things that you wanna do, you could put it in there.

So the way it’s used is pretty simple.

You just pass in the name of the feature, and it will tell you whether or not it’s enabled Boolean value.

So you could do more complicated things, so we have conditions for example, which allow you to pass in keyword arguments and those will be used to determine whether or not to enable the feature.

Yeah, and I made all my examples about horses, I thought cats might be a little too cheesy.

So we support a bunch of condition types. We can do bucketing, which is kind of basically A/B testing.

I didn’t wanna call it that because I don’t think of Flipper as being like an A/B testing or experimentation tool.

But you can either randomly assign or consistently assign people to buckets if you wanted the flag to be rolled out to only a small percentage of the audience and then ratchet it up.

Recently, we released a new microservice for authentication, and we’ve been directing slowly but surely more traffic to that separate service over time.

You can also automatically ramp up the percentage roll out too, so we have the concept of ramps.

You can configure it to start at any level and end at any level and go over the course of time.

So you can use this with Datadog, for example, and I’ll show you in a second, a brief demo, to basically monitor for error rates or latency, or whatever metrics you want to use to determine whether or not the feature is working or not.

And then you can instantly turn it off as it starts to go out, if you realize that there’s a problem.

We also expire our flags automatically.

This doesn’t ship with Flipper itself, but we wanted people to be responsible for cleaning up their flags.

So they’ll expire after 30 days and if you don’t do anything about it, then it’ll throw an exception.

So that was unpopular, but I think it was actually a good decision in the end.

Feature flags in action: A quick demo

So demo time, let me do a quick demo.

So I have a little service that I made.

It’s just a Flask web app and it’s running here in my browser.

And basically, it’s going to do an API request for each of these squares to determine which color to show.

So if I reload it, you’ll see it start to go.

So by default everything is blue.

Now, I can go in and this is what the code looks like. So I can create a flag, and I can tell it to enable it by default.

The default behavior if you don’t do that, by the way, is for the feature to be disabled.

But as soon as I create that, you’re going to notice that the squares start turning green, and I can go into Datadog and there’s a little bit of delay with the Datadog collector.

But you should start to see problems here if that’s the case, and looks like we’re okay.

If I wanted to do conditions, for example, then I can recreate this flag, and then basically say, only enable it for even numbers.

So every API request sends the index of the square, it’ll modify too and determine whether or not to show it.

So I can go ahead and paste this in, and we should see that now only the even squares are turning green.

And then I wanted to show something a little bit more complicated.

So this is bucketing.

Basically, what I did is I created a feature inside my little Flask app that will throw errors 10% of the time to simulate something that was not quite working.

And we were worried about this, we know it’s risky.

So we’re gonna start it with a percentage of 10% and then ramp it up to 100% over the course of two minutes.

In production, in a real world application, you’d probably use a longer duration, but this is the demo.

So I’m going to go ahead and actually create this flag and you should start to see some exceptions.

There we go.

I’m seeing exceptions in the logs.

So it should turn the square pink, but it’s only starting out as 10% enabled, right?

So most of them are still blue.

And then occasionally, you see a red one.

That’s an error. So I could go in and eventually, this will start to show me a non-zero error count, and then I could go in and just instantly disable it and then go back to the way it was.

So I’ll let this run for a little bit longer and you can see it in action as it starts to ramp up.

But that’s the basic idea.

Let me go ahead and show you.

This is the code, by the way.

So basically, in the code, I’m just checking is the feature enabled and then if so, return green, or pink, or throw an exception, or by default, return blue.

So the other interesting thing is that we see a bit of latency associated with talking to Redis.

So to solve that problem, what we did is we introduced a cache, so there’s a local cache on every server that has a TTL that you can set.

So if I were to go in here and change this, so instead of using this feature flag client with just a Redis store, what I can do is actually do this where I wrap my Redis feature flag store in a cached feature flag store.

And then every key will have a TTL of 10 seconds, you can configure it to be higher.

But if I go ahead and do that, it’s gonna reboot automatically.

And there we go.

By the way, there’s our error rate going up.

So it’s clear at this point that there’s a problem.

But yeah, now because I have that cached feature flag store, we should start to see our percentage time spent in Redis go down.


Anyway, I should wrap it up.

We’re just about out of time here.

So that’s pretty much all I have for you.

But one more thing, Flipper is open source as of today, we’re just announcing it.

So we’d love it if you’d go check it out. Download it, let us know what you think.

Use it.


Right now it’s Python only, but we would love to accept contributions for other languages.

Like I said, if it does JSON, Flipper will support it.

So it’s been great.

Thank you very much.

Come find me later.