---
title: "How we use Datadog to get comprehensive, fine-grained visibility into our email delivery system"
description: "Learn how our email service integrations help us ensure a vital line of communication with our customers."
author: "Alexa Liaskovski, Aaron Kaplan"
date: 2025-10-07
tags: ["log management", "amazon ses", "mailgun", "sendgrid"]
blog_type_id: the-monitor
locale: en
---

[Visibility into email performance](https://www.datadoghq.com/blog/email-performance-integrations.md) is indispensable to any organization that counts on its ability to reach people through their inboxes, including Datadog. SREs, FinOps, and many other teams rely on email as a critical channel for communications from our platform, including monitor alerts, usage reports, and service account notifications. At Datadog, we depend on the visibility provided by our integrations for [Mailgun](https://docs.datadoghq.com/integrations/mailgun.md), [SendGrid](https://docs.datadoghq.com/integrations/sendgrid.md), and [Amazon SES](https://docs.datadoghq.com/integrations/amazon-ses.md) to optimize our email performance and ensure [deliverability](https://www.twilio.com/en-us/resource-center/email-deliverability).

In this post, we'll take a close look at how we use these integrations internally at Datadog to monitor the delivery of every email going through our app. In particular, we'll explore the custom metrics we use to augment the out-of-the-box (OOTB) visibility provided by these integrations in order to closely analyze email performance and maintain the health of one of our platform's vital lines of communication.

## Creating cross-transport visibility for comprehensive delivery tracking

Our integrations for SendGrid, Mailgun, and Amazon SES use webhooks from each transport to collect events data and verify successful email delivery. In the email delivery life cycle, these events begin with the addition of messages to transports' sending queues. From there, they cover a range of eventualities up to and including successful delivery. The key events we track are:

- Bounces, in which delivery attempts are rejected by receiving servers
- Drops, in which transports forgo delivery attempts based on previous issues with receiving servers
- Deferrals, or soft bounces, in which delivery temporarily fails and emails are added back into queues

All of this seems simple enough in principle. In practice, however, the data gets complicated, and we rely heavily on Datadog [Log Management](https://www.datadoghq.com/blog/search.md?s=%22log%20management%22) to create consistency and eliminate hurdles for analytics and troubleshooting. For example, these transports use inconsistent terminology in their logging of delivery events: SendGrid logs label bounce events `Bounced` with a type of `Blocked`, whereas Mailgun labels them `Failed`, with a `reason` of `Suppress-Bounce`. Our integrations for these transports include OOTB [logs pipelines](https://docs.datadoghq.com/logs/log_configuration/pipelines.md?tab=source) that do some massaging of log data upon intake for improved consistency. But we also [use Log Management to standardize and enrich these logs](https://www.datadoghq.com/blog/email-performance-integrations.md#get-enriched-visibility-into-your-organizations-email-performance), which helps us get a cohesive picture of email delivery patterns across all of our transports. Here's how we standardize transport event names:

| SendGrid event names | Mailgun event names | Amazon SES event names | Standardized event names |
| --- | --- | --- | --- |
| @evt.name:processed | @evt.name:accepted | @evt.name:send | @evt.name:accepted |
| @evt.name:delivered | @evt.name:delivered | @evt.name:delivery | @evt.name:delivered |
| @evt.name:dropped | @evt.name:failed and @reason:suppress-bounce | @evt.name:deliverydelay | @evt.name:dropped |
| @evt.name:bounced | @evt.name:failed and @event-data.severity:permanent | @evt.name:bounce | @evt.name:bounced |
| @evt.name:deferred | @evt.name:failed and @event-data.severity:temporary | @evt.name:deliverydelay | @evt.name:deferred |

We've also customized the OOTB logs pipelines for our email transports. For example:

- We add `warning` and `error` statuses to `deferred` and `dropped` events, respectively.
- We measure the delivery `lifetime` of each message by calculating the difference between email queuing and delivery times.
- We extract domains from recipient email addresses in order to tag and group metrics by domain.

![An overview of our internal logs pipeline for SendGrid](https://web-assets.dd-static.net/42588/1776300235-internal-monitoring-email-delivery-internal-monitoring-email-delivery-sendgrid-logs-pipeline.png)
*An overview of our internal logs pipeline for SendGrid.*

We also use Grok Parsers to analyze the `reasons` for bounces logged by our transports: Inconsistencies in `reason` values cause inconsistencies in our tagging, so by using Grok Parsers to extract data such as SMTP codes and remap common error messages to enforce consistency, we're able to get a clearer picture of delivery issues at scale.

![A sample Grok Parser from our logs pipeline for SendGrid](https://web-assets.dd-static.net/42588/1776300239-internal-monitoring-email-delivery-internal-monitoring-email-delivery-grok-parsing-rules.png)
*A sample Grok Parser from our logs pipeline for SendGrid.*

To facilitate troubleshooting of these issues, we use [Saved Views](https://docs.datadoghq.com/logs/explorer/saved_views.md) to enable our support team to quickly search logs for outgoing emails. This way, support engineers can jump straight into targeted troubleshooting in the event that a customer reports that an email they were expecting from Datadog is not in their inbox.

![Saved View for troubleshooting email delivery issues](https://web-assets.dd-static.net/42588/1776300244-internal-monitoring-email-delivery-internal-monitoring-email-delivery-saved-views.png)
*One of our Saved Views for troubleshooting email delivery issues.*

## Refining our visibility into email delivery via custom metrics

Enforcing consistency in our email transport logs has helped us create a strong foundation for targeted troubleshooting and analysis. It's also helped us effectively monitor patterns in email delivery with the right level of granularity. Each of our transports generates aggregate metrics that are collected by our integrations, but by [using Log Management to generate our own custom metrics](https://app.datadoghq.com/logs/pipelines/generate-metrics) that cover the delivery of all emails from our platform, we've achieved improved granularity and enriched our cross-transport visibility. We use our standardized transport logs to generate the following metrics:

| Metric Name | Type | Description |
| --- | --- | --- |
| email\_outgoing.event.accepted | Count | Email accepted by transport for delivery. |
| email\_outgoing.event.all | Count | Total count of all email events. |
| email\_outgoing.event.bounced | Count | Email bounced by recipient server. |
| email\_outgoing.event.clicked | Count | Link within email clicked. |
| email\_outgoing.event.deferred | Count | Email delivery failed, reattempt pending. |
| email\_outgoing.event.delivered | Count | Email successfully delivered. |
| email\_outgoing.event.dropped | Count | Email dropped based on transport's built-in suppression list. |
| email\_outgoing.event.opened | Count | Email opened by recipient. |
| email\_outgoing.lifetime.bounced | Distribution | Length of time between email queuing and bounce. |
| email\_outgoing.lifetime.delivered | Distribution | Length of time between email queuing and delivery. |
| email\_outgoing.lifetime.deferred | Distribution | Length of time an email has been deferred. |

Here, you can see how these metrics are defined in our UI:

![Our custom metrics for tracking email delivery, seen within the Datadog Log Management UI](https://web-assets.dd-static.net/42588/1776300249-internal-monitoring-email-delivery-internal-monitoring-email-delivery-generate-metrics.png)

We track these metrics in a centralized dashboard for clear and detailed visibility into delivery patterns, such as our total overall bounce and drop rates, the rates and volumes of emails delivered and dropped by message type (e.g., monitor alerts, daily and weekly digests) and recipient domain, and latencies for deferred messages by message type and recipient domain.

![internal-monitoring-email-delivery-dashboard](https://web-assets.dd-static.net/42588/1776300253-internal-monitoring-email-delivery-internal-monitoring-email-delivery-dashboard.png)

Tracking the number of emails bounced by [SMTP code](https://en.wikipedia.org/wiki/List_of_SMTP_server_return_codes) is particularly useful for helping us understand what's driving issues with delivery.

![Detail of our internal dashboard for monitoring email delivery showing bounce data](https://web-assets.dd-static.net/42588/1776300258-internal-monitoring-email-delivery-internal-monitoring-email-delivery-bounce-widgets.png)

## Ensuring a vital line of communication with our customers

At Datadog, our integrations for Mailgun, SendGrid, and Amazon SES have enabled us to create fine-grained, cross-vendor visibility into one of our platform's vital lines of communication, helping us ensure the timely delivery of everything from monitor alerts to usage reports. [Learn more about our email transport integrations](https://www.datadoghq.com/blog/email-performance-integrations.md)—and, if you're new to Datadog, consider <!-- Sign-up trigger (signing up for a 14-day free trial) omitted -->.