Dash conference! July 11-12, NYC
AWS outage? Datadog alerts you

AWS outage? Datadog alerts you

/ / /
Published: October 19, 2015

No service is foolproof. Even the most reliable ones, like Amazon Web Services, can experience outages. You might have heard about the DynamoDB service disruption that happened last month. Many websites and applications such as Netflix, Reddit, Medium, Pocket, Buffer, and Product Hunt were affected and became inaccessible. Maybe you were affected, too.

Even though you are not responsible for AWS outages, your company may lose revenue, and your users will probably blame you. So you want to be immediately alerted when AWS is down in order to make sure you limit the impact as much as possible. That’s why today we are releasing AWS Outage Alerts on Datadog. Thanks to this new feature, your team will be notified right away whenever any AWS service is having an issue.

Datadog AWS Outage alerts

Immediately alerted

Datadog constantly checks the AWS Service Health Dashboard. This status page is updated by AWS and shows whether each service is operating normally. So, whenever one of them is having an outage or any problem, Datadog knows immediately.

AWS Service Health Dashboard

Set it up in 1 minute

The only thing you have to do is create a new monitor and select the Integration type:

Set up integration monitor on Datadog

Finally, select Amazon Web Services, and you will be able to set up the conditions of the alert in the Integration Status page:

AWS outage Alert conditions

All the power of Datadog alerts

AWS outage alerts are full-featured Datadog alerts, so you can:

  • Choose a scope so you can trigger different alerts depending on the AWS services and the availability zones impacted by the outage
  • Set the alert conditions you want (we recommend you trigger and resolve the alert after one check reports a status change)
  • Customize the alert message that will be sent to your teams so you can specify what’s happening and suggest what can be done to limit the damage during an AWS outage
  • Select who should be notified (specific people, only engineers on call, etc.) and via which communication channels (PagerDuty, email, Slack, HipChat…)
AWS outage Alert configuration

Spot issues directly from your dashboards

You can also see AWS outages at a glance by adding a Check Status widget to your screenboards.

AWS Outage Check Status widget on screenboard

You can then select which service you want to monitor the status and which regions matter to you:

AWS outage Check Status widget configuration

A Single Check will monitor only one service/region combination while a Cluster of Checks allows you to monitor any service globally. If 1 out of 3 regions are down for example for the selected service, you will see a red 1 and a green 2.

Riding out the storm

You may be able to limit the impact on your applications while waiting for AWS to resolve the outage. Depending on the services you use, their configuration, and the volume of traffic you are receiving, you may be able keep your applications up and responsive.

For example, your DynamoDB tables are replicating data across multiple availability zones in order to remain accessible if the service goes down in a specific AZ. Your load balancers should be able to distribute incoming requests to viable availability zones thanks to the cross-zone load balancing feature. If DynamoDB went out in one AZ, you might want to consider adding more DynamoDB instances in the remaining viable AZs in order to avoid overloading your remaining instances and having their requests throttled.

Already a Datadog customer? You can now set up AWS Outage Alerts. Otherwise, try it out yourself by signing up for a of Datadog.

Want to write articles like this one? Our team is hiring!