AWS Health is a service that provides continuous visibility into the status of your entire AWS environment. It delivers near-real-time alerts in response to changes in the health of AWS resources, including upcoming maintenance events and unexpected issues. This information is available in the AWS Personal Health Dashboard and through the AWS Health API.
Datadog’s new integration helps you monitor the health of your AWS environment by automatically creating rich, contextual events from the AWS Health API. And, because Datadog also integrates with more than 700 infrastructure technologies (including other AWS services), you can easily correlate these AWS Health status events with metrics and events from the rest of your infrastructure, all in one place.
When an AWS service change occurs, the Health API generates a detailed event object. First and foremost, this object has a human-readable eventDescription of what occurred. One such example can be seen below:
Elevated Latency and API Error Rate 04:49 PM PST We are investigating
increased API error rates and latencies for the Amazon EC2 Container Service APIs
in the US-WEST-2 Region. Container instance connectivity and
running tasks are not affected.
The Health API also provides metadata for each event. There are a few important key:value pairs that make this event object particularly useful:
- eventTypeCategory: issue | accountNotification | scheduledChange
- region: The AWS region where the event occurred
- service: The AWS service that is affected by the event, e.g., EC2
- statusCode: open | closed | upcoming
Datadog’s AWS Health integration allows us to go one step further and capture the Affected Entities from the change. Entities can refer to specific resources, groups of resources, and even entire AWS accounts.
Several pieces of metadata about the affected entities are returned from the AWS Health API, including:
- awsAccountId: The 12-digit AWS account number that contains the affected entity
- entityARN: The unique identifier for the affected entity
- statusCode: IMPAIRED | UNIMPAIRED | UNKNOWN
We capture all of this information to create context-rich events in Datadog. Each AWS Health status event will appear in Datadog as an event, with the AWS eventDescription as its description, and tagged with the relevant metadata in key:value form (e.g.,
service:ec2). Each event also includes a JSON-formatted Affected Entities section, for easy string matching.
With this information, users can find out about AWS service changes more quickly and be better prepared to troubleshoot issues that affect specific resources throughout their infrastructure and applications.
While having this information in Datadog is useful on its own, the full value of this integration comes when pairing it with event alerts. You can set up an event alert to get notified when a specific type of Amazon Health status issue occurs, using string matching, tags, and more to narrow down the scope of your alert. In the example below, we use the Amazon Health source and a few tags that are associated with Amazon Health events (
event_category:issue) to quickly create an alert that will notify us when AWS has an open issue.
This alert prevents the all-too-common situation where you notice a problem with your infrastructure, and attempt to triage and remedy the situation, only to find out several minutes later that the issue extends beyond your environment to the underlying infrastructure. Now, you can receive a notification in near real time whenever AWS is having a problem, and implement a failover solution to avoid any unwanted downtime.
If you’re already using Datadog, follow the instructions here to integrate AWS Health with Datadog. Please note that the AWS Health API is only accessible to AWS Support customers who have a Business or Enterprise support plan. Therefore this integration is only available to those customers.
If you don’t yet have a Datadog account, sign up for a 14-day free trial to get complete visibility into your AWS environment.