DASH 2023: Guide to Datadog's Newest Announcements | Datadog

DASH 2023: Guide to Datadog's newest announcements

Published: August 3, 2023

DASH 2023: Guide to Datadog's newest announcements

This year at DASH, we announced new products and features that enable your teams to get complete visibility into their AI ecosystem, utilize LLM for efficient troubleshooting, take full control of petabytes of observability data, optimize cloud costs, and more. With Datadog’s new AI integrations, you can easily monitor every layer of your AI stack. And Bits AI, our new DevOps copilot, helps speed up the detection and resolution of issues across your environment. We introduced new products that help you secure your cloud infrastructure, find vulnerabilities in your application code, and conduct historical security investigations. We also expanded our developer-focused features by adding static code analysis and end-to-end mobile monitoring. In this post, we recap these new offerings—as well as all the other key announcements from DASH 2023—and help you get started using them to gain deeper visibility into your environment.

LLM-powered observability and your AI ecosystem

Integration roundup: Monitoring your AI stack

As AI development accelerates across industries, Datadog is at the forefront of helping you gain visibility into every layer of your AI-optimized tech stack, from infrastructure and data storage to models and service chains. Integrating machine-learning models into your applications and workflows often means leveraging a number of specialized technologies, including vector databases like Pinecone and Weaviate, development platforms like Vertex AI and SageMaker, and discrete GPUs from providers like CoreWeave. With our 12 new AI integrations—including our NVIDIA DCGM Exporter integration—you can access out-of-the-box dashboards with visualizations and metrics tailored to each component of your stack, enabling you to ensure that your models effectively scale according to your business needs. Read our blog post to learn more.

Check out our 12 new AI integrations to monitor health and performance across your entire stack.

LLM Observability

The introduction of pre-trained large language models (LLMs) such as GPT and BERT has revolutionized the usage of generative AI technology. While implementation of LLM-powered applications has become easier, it is difficult to gain visibility into the underlying LLMs as application developers and machine learning engineers have limited control of or insight into how these pretrained models operate. This may lead to model underperformance or even inaccuracies such as model hallucinations that create business and reputational risks. Datadog LLM Observability enables users to observe prompts and responses in order to track model performance, identify opportunities for improvement, and optimize the end-user experience. LLM Observability provides an always-on solution that continuously monitors your LLMs to identify problematic clusters, model drift, and specific prompt and response characteristics that impact model performance. To easily monitor your LLMs in production, you can request access to LLM Observability in private beta.

Bits AI: Datadog’s generative AI interface

Bits AI is your new DevOps copilot, helping you investigate and respond to incidents more efficiently across the Datadog web app, mobile app, and Slack. You can converse with Bits AI in natural language to query data from Datadog products like APM, Log Management, Cloud Cost Management, Real User Monitoring, and more, and surface key insights like faulty deployments, log and trace anomalies from Watchdog, and Security Signals—all in one place. In addition to helping you investigate issues, Bits AI can also help you streamline incident management by paging on-call responders, summarizing the incident timeline to get your teams up to speed, and drafting a postmortem. With the power of generative AI, Datadog can help identify and suggest fixes for code-level issues, generate synthetic tests to proactively improve your user experience, and find and trigger workflows to automatically remediate critical issues. Bits AI is available in private beta. To sign up, fill out this form. To learn more about Bits AI, check out our blog post.

Get detailed insights into your triggered alerts with Bits AI

Log Management

Flex Logs

The volume of logs that organizations collect from all over their systems is growing exponentially. As a result, logs have become increasingly difficult to manage. Organizations must reconcile conflicting needs for long-term retention, rapid access, and cost-effective storage. In response to these challenges, we’re pleased to introduce Flex Logs. Building on the flexibility offered by Logging Without Limits™, which decouples log ingest from storage—enabling Datadog customers to enrich, parse, and archive 100% of their logs while storing only what they choose to—Flex Logs decouples the costs of log storage from the costs of querying. It provides both short- and long-term log retention for a nominal monthly fee without sacrificing visibility, eliminating the need for self-maintained databases and enabling seamless correlation between all of your logs, metrics, and traces. With Flex Logs, Datadog provides a solution to all of your logging use cases within one platform. Flex Logs is in Limited Availability. Datadog Log Management users can sign up with this form to get started or learn more in our dedicated blog post.

Flex Logs decouples the costs of log storage and querying, providing cost-effective long-term retention without impeding visibility

15-min unsampled Live Search for Log Management

With thousands of logs generated every minute from your infrastructure, applications, services, and devices, retaining this copious amount of data for active search and analysis can be cost-prohibitive. Stream-based monitoring solutions are gaining popularity because they can support real-time troubleshooting and eliminate the need to store data that only needs to be leveraged transiently.

To support your troubleshooting efforts, we offer a new stream-based feature, Live Search for Datadog Log Management, now available in private beta. Live Search for Log Management supports real-time troubleshooting investigations by providing search and analysis capabilities across all ingested logs for the past 15 minutes, completely unsampled. Live Search gives you full visibility into all of your logs post-processing—regardless of how you’ve configured your indexes, quotas, or exclusion filters. Live Search also conveniently correlates with APM Live Search, so you can view, search, and analyze all logs within the last 15 minutes that are associated with a specific trace. Live Search for Datadog Log Management is designed to handle data at petabyte scale, and enables you to view and query all ingested logs without any pressure to retain them. Learn more about Live Search for Log Management and request access to the private beta from our dedicated blog post.

Shows all ingested logs associated with a specific service

Add context to your logs at query time with Reference Tables

Logs are an invaluable resource for investigating application errors, security incidents, and performance issues, but they often lack contextual data that is critical for helping teams understand how to best tackle a problem and prioritize issues. For example, logs may include customer IDs but not other relevant data like customer names, generated revenue, sales region, or support tier.

Datadog’s Reference Tables solves this problem by enabling you to enrich your logs with your own data. You can use Reference Tables to add business-critical customer information to your logs at query-time, for example, so you can search for and prioritize issues that affect a particular segment of customers. During security incidents, Reference Tables that contain details like threat IP addresses can help you pinpoint the source of malicious activity. Reference Tables are also deeply integrated with the Log Explorer, so you can use them to filter your logs at query time and ensure that you have the up-to-date context you need for audits, investigations, and more.

Reference Tables and the ability to enrich and filter logs at query time are in public beta. Check out our dedicated blog post for more information.

Add context to your logs at query time with Reference Tables

APM and service governance

Set up APM in minutes with Single-Step Instrumentation

Datadog’s market-leading APM solution provides users deep, service-level insight with telemetry in context and AI assistance, so they can observe, troubleshoot, and improve their cloud-scale applications. However, large organizations can struggle to efficiently set up APM via the instrumentation of code owned by disparate infrastructure and application teams. To help customers roll out distributed tracing more quickly and easily across their entire organization, Datadog APM now offers Single-Step Instrumentation. This means that a single engineer can instrument all services in minutes at the same time they install the Datadog Agent, which automatically adds the relevant client library to the application’s code. In addition to Single-Step Instrumentation, we have built support for remote instrumentation and configuration into the Datadog app, as well as auto-instrumentation for Go. To try out any of these new features, fill out the beta access request form.

Set up Datadog APM in minutes with Single-Step Instrumentation

Understand the business impact of backend errors with Trace Queries

Datadog APM and Distributed Tracing help pinpoint the source of errors and latency anywhere in a request path—from the underlying infrastructure to a slow network and inefficient code. However, it can be grueling to conduct root-cause investigations with APM data and understand the business impact of issues when needed information is dispersed across different parts of a request. Trace Queries allows users to query multiple services, endpoints, and other attributes at once using boolean and other operators. This way, they can isolate a specific dependency to confidently find the source of an issue stemming from a downstream service, and quickly understand the impact on upstream services, endpoints, pages, and end users, which makes prioritization easier. Trace Queries is in private beta. Learn more in out documentation and request access via this form.

Add context to your logs at query time with Reference Tables

Reproduce exceptions with production variable snapshots in APM

Debugging errors in production environments can be a complex and lengthy process that often disrupts your development cycle. Without access to the inputs and associated states that caused the errors, reproducing them becomes difficult. To help you remediate bugs and discover their root causes faster, Datadog now automatically captures variable snapshots in your APM error traces. Production variable data enables you to quickly reproduce exceptions that have surfaced in your traces and Error Tracking Issues with real production state and inputs. Variable values are collected and annotate each frame in the exception stack trace. Get started debugging errors and reproducing exceptions with Python variable data today by requesting access to our private beta.

Query and aggregate request traces using span relationships with Trace Queries

Service Scorecards

The Service Catalog provides a centralized resource for your organization’s knowledge of your services. Now, it also includes Service Scorecards, which help you identify and remediate observability gaps in your services. Service Scorecards highlight opportunities for teams to address shortcomings that could limit the visibility of their services—such as services that are missing SLOs or on-call schedules. They also make it easy to spot services that don’t adhere to observability best practices, for example by not correlating APM data with logs. Stakeholders across the organization can see each service’s score in three key categories—Ownership and Documentation, Production Readiness, and Observability Best Practices—and across custom scorecards that evaluate services based on rules you define. See our blog post and documentation for more information about Service Scorecards, which are now available in public beta.

Service Catalog shows each service's score from the ownership and documentation, production readiness, and observability best practices categories.

Enhanced visibility into service-to-service connections and inferred services

The health and performance of any service is inextricably linked to that of its dependencies. We’re pleased to announce that Datadog APM now offers enhanced visibility into your interconnected services, both through the Service Map and through individual Service Pages. Service Pages now map each service’s dependencies and provide full, unsampled statistics on service-to-service communication. That means you can now use APM to directly measure the request counts, latency, and error rates between any two services or between a service and its external dependencies, enabling you to quickly identify slow or failing connections.

Datadog now also automatically identifies non-directly instrumented dependencies such as external APIs and cloud providers based on span tags on your instrumented services. These inferred services are indicated by dotted node borders and highlighted in blue. You can inspect edge trace metrics by selecting an inferred service from either the Service Map or a Service Page. Enhanced dependency analysis on Service Pages and inferred services are both available in private beta. Datadog APM users can sign up here. To learn more about inferred services, check out our documentation.

Datadog APM now provides metrics on interservice communication and visibility into non-directly instrumented dependencies

Resource Catalog Ownership tab

As your cloud environment grows in complexity, it can be difficult to understand which teams own specific resources and which services are running on them. Incident responders and security engineers often need to collaborate across teams to identify and troubleshoot upstream and downstream resources, but they may not know who to work with. And FinOps teams need to attribute the cost of resources to the teams that own them. The Resource Catalog now includes an Ownership tab that shows you which team owns a resource—including their contact information and on-call schedule—and what services that resource supports. By referencing this new tab, you can easily determine which resources are involved in an incident and quickly contact the relevant teams to resolve the issue, helping you minimize your MTTR. The Resource Catalog Ownership tab is currently in private beta; to request access, fill out this form.

The Resource Catalog's Ownership tab shows entries for Amazon EC2 instances and AWS Lambda functions, including the service the resource supports, the team that owns the resource, and the person on-call to support it.

Define teams using identity providers data and control their access to individual resources within Datadog

Earlier this year, we introduced Datadog Teams, which enables streamlined collaboration and precisely tailored visibility throughout your organization by enabling you to designate team-specific assets and views. We’re pleased to announce a three-part enhancement to Teams that can help organizations accelerate onboarding and strengthen their governance posture. First, by enabling you to define teams using data from Identity Providers, Teams now facilitates stronger authentication and inheritance of detailed profiles and contextual data.

Team memberships can be sourced from Azure AD or Okta SAML assertion files, either instead of or in combination with existing API-driven workflows and manual configuration. Second, our enhanced Attribute-Driven Access Control enables you to delegate team-based access to individual resources within Datadog. For example, you can assign edit permissions for any dashboard based on team membership. Finally, our enhanced Team Page makes it easier to grasp the membership and key resources of each team at a glance, enhancing visibility and transparency throughout your organization and enabling new hires to quickly get up to speed. These new capabilities are currently available in public beta—you can fill out this form to request access. To learn more about Datadog Teams, see our blog post or check out our documentation.

You can now use Teams to delegate team-based access to individual resources within Datadog

Ambassador program

Datadog is thrilled to announce the Datadog Ambassadors program, an initiative that aims to recognize and highlight individuals around the world who have significantly contributed to the Datadog community. Datadog Ambassadors are exceptional professionals who have shared their deep technical insights and creative solutions through blog posts, custom integrations, open-source software, and more. Selected based on their personal contributions and achievements, prospective Ambassadors can be nominated by Datadog employees or, eventually, by current Ambassadors. The annual membership comes with rich benefits, such as a trip to DASH, unique merchandise, and complimentary Datadog Certification exams. Datadog is excited to work with our inaugural group of eight Ambassadors as we continue to build a helpful and welcoming professional community focused on sharing knowledge.

To learn more about our first eight Datadog Ambassadors and the Datadog Ambassador program in general, you can read our blog post on this topic.

Ambassadors program

End-to-end security for all

Protect against IAM-based attacks with Datadog CIEM

Identity and access management (IAM) is a crucial part of securing rapidly growing cloud environments. However, when teams provision resources, identities, and workloads quickly, it can be difficult to keep track of numerous direct and indirect permissions. That’s why it’s critical to keep your IAM configurations up to date to safeguard against any IAM-based attacks.

Datadog Cloud Infrastructure Entitlement Management (CIEM) enables you to continuously discover and address identity risks—such as permission gaps and administrative privileges—and reduce their blast radius before a threat actor can exploit them. You can address one identity risk at a time by grouping all resources—roles, policies, and groups—that carry that risk. Or, you can methodically review your resources and all of their associated identity risks. And to help you remediate any issues, Datadog CIEM also provides the necessary context for investigations and enables you to pivot directly to your AWS console to update your IAM configurations. Check out our blog to learn more.

Protect your infrastructure from IAM-based attacks with Datadog Cloud Infrastructure Entitlement Management

Mitigate vulnerabilities with Datadog Infrastructure Vulnerability Management

The cloud infrastructure that supports applications can comprise hundreds of thousands of hosts, containers, and serverless functions that change constantly. With this type of scale and complexity, it’s often difficult to track critical vulnerabilities and gain a complete picture of which resources are vulnerable to specific threats. Datadog Infrastructure Vulnerability Management solves this problem by automatically identifying vulnerabilities in container images and hosts and offering context-rich insights into which ones need to be resolved first. With our security-focused insights, you can prioritize the most serious issues before they threaten your infrastructure. Check out our blog for more details.

Detect and address vulnerabilities across your container images and hosts

Datadog Code Security

As the velocity of deliveries and deployments increases, it becomes more and more difficult to ensure that all the vulnerabilities in your code are caught by the time it reaches production. While there are many application security testing tools designed to help you catch these issues, they are often inaccurate, prone to false positives, and slow to run, making them impractical for monitoring fast-paced systems. Datadog Code Security detects vulnerabilities in your application code in real time, continuously monitoring your active code from within as it executes. With the Vulnerability Explorer page, you can easily view a list of the vulnerabilities that pose the greatest threat to your system, alongside meaningful risk scores and impacted services. And to help you quickly remediate issues, you can even view file names, line numbers, and snippets for the source code where the vulnerabilities were found. Check out our blog post to learn more.

Easily view a list of vulnerabilities detected in your application code, sorted by severity.

Application Security Management - API Security

API-specific attack vectors are commonplace these days in large-scale, high-impact security breaches. Securing your APIs can’t be done well with web application firewalls (WAFs) alone—you need tooling that understands your apps’ authentication layers and business logic. Datadog Application Security Management (ASM) now extends its Threat Management and Protection capabilities to provide deep visibility into threats targeting your APIs. You can now use the API Catalog to pivot directly from viewing an API’s health and performance metrics to ASM, where you can see attacks targeting your APIs. This includes attack attempts, which show the attacker’s IP and authentication information, as well as request headers, which show details about how the attack was formed. By using ASM and API Management together, you can maintain a comprehensive view of your APIs’ attack surface and respond quickly to mitigate threats. This feature is now available in public beta—sign up via this form. For more information about the API Catalog, see our documentation.

View triggered security signals for your APIs in the API Catalog

Historical security investigations with Cloud SIEM Investigator

Security breaches often go undetected for months—typically long after log retention has expired—leaving investigators unable to answer critical questions about the scope and impact of the attack. Cloud SIEM Investigator lets you conduct historical security investigations that draw on a deep history of logs so that you can see how a security breach happened, even if it began long ago. By visualizing a complete history of the attack, you can identify malicious actors and their tactics. To quickly contain an active attack and mitigate ongoing risks, you can use Workflow Automation to trigger runbooks or remediation steps. Read our blog post to learn more about conducting historical security investigations with Cloud SIEM Investigator.

Cloud SIEM Investigator shows failed actions against the S3 service and a workflow executed to mitigate the attack.

Investigate important security issues with CSM Security Inbox

Security Inbox for Cloud Security Management (CSM) is a new capability that correlates and prioritizes your security issues into a single, actionable list. This helps you quickly respond to issues such as misconfigurations, identity risks, and runtime threats that require immediate attention. Security Inbox automatically groups risk factors—such as public accessibility, attached privileged roles, and critical vulnerabilities—into central root issues for you to address. By inspecting an issue, you can review the origin of different risk factors, explore various remediation actions, and gather more context on impacted resources.

To learn more about Security Inbox, check out our documentation.

Prioritize and respond to security issues using Security Inbox.

Shift-left and developer experience


Enforce code quality and security standards with Quality Gates

Datadog Quality Gates enables you to define flexible rules to ensure that deployments meet your organization’s standards for code quality, performance, and security. You can configure out-of-the-box rules to block commits that introduce new flaky tests or contain Static Analysis violations. Quality Gates gives you the flexibility to decide which repositories and branches a rule should be evaluated for. It also enables you to strictly block deployments that contain critical errors, while creating non-blocking rules for code style violations and other noncompliances. For example, the rule shown below blocks any commit across your CI pipelines and workflows that introduces a security vulnerability to a specified service.

Quality Gates is now available in private beta. You can request access via this form and learn more by reading our blog post.

Block commits that introduce security vulnerabilities to your service.

Static Analysis for Python

Datadog Static Analysis analyzes code prior to runtime to detect violations of best practices earlier in the software development lifecycle. This ensures that as your organization grows, new additions to your code base will stay compliant with existing quality standards. Static Analysis provides more than 80 out-of-the-box rules that enable you to detect issues that can break your code at runtime (e.g., an invalid number of parameters), violations of best practices (e.g., too many nested blocks), and security vulnerabilities (e.g., code that can lead to SQL injections, as shown below). To help you quickly resolve a violation, Datadog can automatically suggest a fix based on your code templates. If a template isn’t available, you can also apply an AI-generated fix (shown below) to quickly improve your code.

Static Analysis for Python is now available in private beta. You can learn more in our documentation or request access by contacting our support team.

Quickly improve your code quality with AI-generated suggested fixes.

Digital experience monitoring

Mobile Session Replay

Engineers, designers, and product managers face specific challenges from the mobile community, such as expectations of fast load times, continuous availability, and a frictionless in-app experience across a wide range of devices. That’s why we’re proud to announce that Datadog’s Mobile Session Replay is now in public beta. Mobile Session Replay provides a visual reproduction of real user journeys through your application on a mobile device, enabling developers to recreate bugs and understand, from the user’s point of view, where things went wrong. This recreation pairs existing troubleshooting data with a visual aid without slowing down your application, enabling you to get a clear and accurate picture of your end user’s experience while ensuring that your application is performant. Mobile Session Replay can also help mobile teams understand how users interact with their features, thus improving developers’ ability to create apps that serve customer needs and stand out in the crowded marketplace. Learn more in our full blog post.

Mobile Session Replay provides a visual reproduction of real user journeys through your application helping you understand where an issue occurred.

Mobile App Testing

Effective mobile application testing requires teams to create tests that cover a range of different device types, operating system versions, and user interactions—including swipes, gestures, touches, and more—while also maintaining the infrastructure and device fleets necessary to run these tests. Datadog Mobile Application Testing delivers fast, no-code, and reliable mobile app testing on real devices in the cloud that lets any member of your team—whether they’re technical or non-technical—create and maintain automated tests that seamlessly integrate into your CI/CD pipelines, so you can ship mobile apps with confidence and increase your release velocity.

With Datadog Mobile Application Testing, you can create and run end-to-end tests for your applications on both iOS and Android without writing code, by simply clicking and navigating through the most important flows in your app, just like your users do. Datadog runs these tests on real phones and tablets (as opposed to emulators) to provide a realistic, step-by-step representation of key application workflows, screenshots of each step, and detailed pass/fail results, so that your team can quickly visualize what went wrong and fix it. Additionally, these tests automatically adapt to minor changes in your app, simplifying the task of test maintenance as your app evolves. Learn more in our full blog post.

Mobile Application Testing enables you to create and run end-to-end tests for your applications on both iOS and Android without writing code.

Understand and optimize your cloud resources and costs

Cloud Cost Recommendations

It’s essential to keep your cloud costs in check, but it can be difficult to identify where you can safely optimize to avoid unnecessary costs and still provide the best user experience. We’re thrilled to introduce Cloud Cost Recommendations, which remove the guesswork from cost optimization. Datadog analyzes your observability data to discover savings opportunities, such as containers that you can safely downsize to save money on compute resources. If you’ve enabled Continuous Profiler, you’ll see how implementing downsizing recommendations will impact the latency on your services’ endpoints. You’ll also get recommendations on unused cloud resources that can be safely deleted—and you can even leverage Workflow Automation to continually and automatically delete unused resources. Cloud Cost Recommendations are available in private beta. For more information and to request access, go here.

Cloud Cost Recommendations show a breakdown of potential savings and projections of how reducing a container's compute resources will affect the latency of endpoints in the service.

Kubernetes Resource Utilization

Cloud architects, application developers, and platform engineers increasingly need to ensure that they are using container infrastructure assets efficiently. The first step toward implementing smarter capacity allotment and planning is having real-time and historical insights into the resource usage of your Kubernetes objects. Datadog’s new Kubernetes Resource Utilization view visualizes CPU and memory allocation and utilization across your clusters, pods, and containers so that you can easily identify workloads that are not using resources efficiently or that are poorly provisioned and adjust accordingly. By visualizing the performance of your containerized services alongside their resource utilization in a single pane of glass, you can better understand the actual footprint of your applications. This allows you to better configure your deployments with the right balance of cost, performance, and reliability. Kubernetes Resource Utilization is available now in public beta. For more information, please refer to the documentation.

The Kubernetes Resource Utilization view provides a unified view of resource allocation and usage across your Kubernetes objects.

Visualize container metrics from OpenTelemetry Collector

Datadog’s Container Overview Dashboard provides key insights into the health of containerized environments. Now, it also provides out-of-the-box visibility into containers running OTel-instrumented applications. Customers that are collecting metrics using the OTel Collector and Datadog Exporter can use this dashboard to keep tabs on CPU, network, memory, and IO metrics from their containerized environment. Once you configure the OTel Collector, Datadog will automatically convert (or map) metrics and metadata from the OTLP data model to compatible telemetry schema. This enables you to use tags, such as container ID, service, and environment, to filter metrics when troubleshooting issues. For example, you can use the service tag to determine if a particular service has any CPU-intensive containers that are nearing their configured CPU limits, which can lead to throttling and performance degradation. If your environment relies on both the Datadog Agent and the OTel Collector, you can use the dashboard to monitor containers from both sources within a unified view.

We’ve been working closely with the open source community to introduce this feature along with enhanced support for OTel throughout the Datadog platform. Check out our documentation to begin visualizing telemetry from your OTel-instrumented applications.

Gain a high-level overview into your containerized environment with our OOTB dashboard.

Cloud Service Insights for CDNs

Modern applications rely on a variety of external and third-party dependencies and services. Having the ability to pinpoint whether application problems that affect your users are occurring in your own services or are due to issues with those dependencies is key for efficient troubleshooting. With Cloud Services Insights for CDNs, Datadog Synthetic Monitoring can now automatically detect and notify you of problems with the CDNs you rely on to deliver content. It will also summarize which of your Synthetic tests are likely affected by this issue so you can quickly see the scope of the impact. This makes it easy to quickly identify whether, for example, the root cause of increased latency for users in a specific region lies with an external service. Gaining this type of visibility into your cloud service dependencies enables you to confidently escalate issues with your providers and eliminate unnecessary troubleshooting cycles. Cloud Services Insights is currently available to all US1 customers of Synthetic Monitoring and will soon be generally available. To get started with Synthetic Monitoring, see our documentation.

Datadog Cloud Service Insights for CDNs


Serverless Monitoring for AWS Step Functions

AWS Step Functions is a service that enables developers to create automated, multi-step workflows. With AWS Step Functions, you can build complex workflows that orchestrate Lambda function executions into larger serverless applications. Datadog already enables you to monitor your Step Functions’ metrics and get deep visibility into their associated Lambda functions with the existing integration. And now, we’ve released native support for Step Functions monitoring in public beta. You can view metrics, logs, and traces in the Serverless view to understand the performance of state machine functions as they execute across your workflows. Traces can help you ensure that your Step Functions workflows remain performant by enabling you to quickly spot Lambda cold starts, long-running steps, and execution errors that need troubleshooting. Check out our blog to learn more.

The AWS Step Functions dashboard visualizes key metrics you need to understand the health and performance of your state machines.

Azure serverless monitoring

Datadog now provides deep visibility into Azure Container Apps, a fully managed serverless platform for deploying containerized applications. We also now enable you to monitor Azure Functions running on a Consumption plan, Microsoft’s fully serverless hosting option. This means that you can use our Serverless view to gain visibility into your dynamic workloads and quickly spot cold starts, bottlenecks, and other performance problems.

Support for Azure Container Apps is now generally available. To start collecting and visualizing traces, custom metrics, and logs directly from your Azure Container Apps, see our documentation. Support for tracing Consumption plan—hosted Azure Functions is now in private beta; to request access, fill out this form.

Trace your Consumption plan-hosted Azure Functions to quickly spot cold starts, bottlenecks, and other issues that require attention.

Google Cloud serverless monitoring

Google Cloud Functions is a serverless compute solution that enables you to run single, event-driven functions that handle specific tasks such as payment processing and user authentication. Google Cloud Run is another popular service that offers a fully managed platform for deploying fully containerized applications in serverless environments. If your business-critical applications run on these services, getting deep visibility into their health and performance is necessary for the success of your organization. We are pleased to announce that the Datadog now collects traces, custom metrics, and logs directly from your Cloud Run services, and enables you to get unified insights into this telemetry in the Serverless view. To learn more, check out our documentation.

The Serverless view now also includes support for visualizing end-to-end traces from Cloud Functions in private beta. This means that in addition to the Cloud Functions metrics and logs you may already be monitoring, you can view function executions as they propagate across your infrastructure, so you can quickly spot and remediate cold starts, execution errors, and other performance issues. To sign up for the private beta, fill out this form.

You can now view Cloud Function traces alongside any of their associated metrics and logs you may already be monitoring.

Network monitoring

Jumpstart network investigations with an updated story-centric UX for NPM

When an organization scales, its network inevitably begins to generate massive amounts of traffic from its expanding array of hosts, VPCs, containers, regions, and zones. This volume of data can be very difficult to monitor, especially for anyone new to networking observability who may be unsure about where to begin an investigation in the event a networking issue occurs.

To overcome this challenge and help developers jumpstart network investigations with ease, we have released a story-centric UX for the NPM Overview page. This new UX automatically surfaces key network data and organizes it into distinct sections to assist you in specific problem-solving use cases, such as identifying top traffic costs or understanding service dependencies.

The updated NPM overview page automatically surfaces and organizes key networking data

We have also updated the UX of the NPM Analytics and DNS pages to include recommended queries. These queries help guide investigations by quickly surfacing critical information about your network, such as the volume of DNS timeouts and TCP retransmits occurring within it. To learn more about how our updated UX helps network investigations, check out our blog.

NPM now includes preloaded recommended queries to jumpstart network investigations

NetFlow Monitoring

Network administrators rely on NetFlow data to understand how traffic flows across their network devices. With NetFlow Monitoring, you can now identify the top contributors to your network traffic (i.e., top talkers and top listeners) to understand which services are using the available bandwidth and causing network slowdowns. Our out-of-the-box dashboard visualizes your devices’ NetFlow data—and other NetFlow variants such as IPFIX, sFlow, and JFlow—so that you can investigate traffic flows alongside SNMP metrics, Traps, and Syslogs all on a single, unified platform. NetFlow Monitoring is now generally available and simple to enable with the Datadog Agent. For more details, check out our blog and see our documentation.

Monitor Netflow data to identify the top talkers in your network with Datadog

Network Topology Map for port-to-port device connectivity

When a network engineer is alerted of high latency, it is often difficult to find the best starting point for troubleshooting. With potentially thousands of devices comprising a modern enterprise network, engineers must understand their complex interconnections in order to trace the sources and consequences of poor network performance. Datadog Network Device Monitoring (NDM) now offers a Network Topology Map that provides a detailed bird’s-eye view of your network devices and their relationships.

Rather than stepping across each device one-by-one or relying on institutional knowledge to understand the network environment, you can use the map to inspect any device, view all of its interface connections, and quickly spot which of its dependencies are experiencing errors. Click on any device to open its side panel, where you can view key performance metrics, details about all the interfaces the device is connected to, and more. It is easy to pivot from there to the main NDM page for further context about the device, or to NetFlow Monitoring for more information about the device’s network traffic. The NDM Network Topology Map is now available in public beta. For more information about Datadog Network Device Monitoring, see our documentation.

View details about all the devices in your network and their connections in one place

Database monitoring

Database Monitoring Oracle Support

Datadog Database Monitoring (DBM), which provides host-level and query performance metrics for Postgres, MySQL, and SQL Server, is now available in public beta for Oracle. The databases view shows you vital health metrics at the host-level, such as throughput and latency. For each host, we capture active connections—the queries that are running on your host—both historically and in near-real time, helping you understand the load that your databases experience and which queries contribute the most to that load. At the query-level, DBM provides visibility into query metrics, samples, and explain plans. These explain plans—the steps the database engine takes to execute a query—can help you see which steps of an expensive query might benefit from optimization. In addition, DBM for Oracle comes with an out-of-the-box dashboard, which provides an overview of host health metrics, like CPU utilization, memory usage, and IO. To get started gaining deeper visibility into your Oracle databases, set up DBM for Oracle on your hosts today. If you have any questions, reach out to your CSM.

Monitor queries and health metrics for your Oracle hosts with DBM

Database Monitoring Watchdog Insights

Datadog Database Monitoring (DBM) offers deep visibility into the health and performance of your database hosts and queries. As your environment scales, you may need help identifying where to look first to understand what’s going on across your systems and troubleshoot database issues. That’s why we’re proud to announce Watchdog for DBM—which uses Anomaly Detection to provide insights around interesting patterns in host activity and queries—is now in public beta. Watchdog for DBM scans your hosts and provides health summaries at the top of the databases page.

Health summaries highlight atypical trends in your hosts, organized by wait groups to indicate what queries are waiting on to be able to execute, such as CPU, IO, Lock, or Network. When you click on a wait group, the UI will filter down the list to show you which hosts are experiencing this anomalous activity. From there, you can click on a host to view a breakout panel (as in the screenshot below) containing additional performance metrics, metadata, and a blurb telling you what has been detected. You can also filter the top queries and active connections to see what was running at the time that this anomaly was detected. And at the query-level, Watchdog points out long-running queries, as well as those that are blocking other queries from running. Get started with DBM today and take advantage of Watchdog’s insights into the health of your database performance.

Watchdog for DBM provides insights into query patterns and host activity.

Automatically correlate database query metrics and request traces

Database-level problems can have an outsize effect on application performance. Datadog Database Monitoring (DBM) provides comprehensive insights into your database queries, wait events, explain plans, and more so you can identify issues and areas for optimization. But understanding exactly which services are actually running these queries and so are affected by poor database performance is vital for quick troubleshooting.

We’re excited to announce that you can now seamlessly pivot from Datadog APM’s service and trace views directly to relevant database metrics and back with DBM’s integration with APM. This enriches your database troubleshooting by surfacing the relevant APM traces and calling services, enabling you to understand how upstream application dependencies relate to your database. For example, starting in APM, you can identify problematic query patterns in Error Tracking and investigate the trace to identify slow-performing queries, isolating the database hosts that are blocked. By connecting DBM and APM telemetry, you can solve database issues faster as important application context is never lost. See our blog post for more information, and learn how to get started today in our documentation.

Automatically pivot between DBM query data and APM request traces.

Alert on Database Monitoring query samples and explain plans

Datadog Database Monitoring (DBM) enables you to get deep visibility into all of your databases and troubleshoot issues by analyzing query performance metrics, explain plans, and host-level metrics. We’re pleased to announce that you can now configure DBM monitors to alert on query samples and explain plans. Datadog provides out-of-the-box monitors for waiting queries (shown below) and long-running queries, so you can identify and quickly respond to lock contention and other performance issues. You can also use DBM to identify your most frequently executed queries and configure monitors to track their cost over time. For example, you can set up a monitor to notify you if a query’s explain plan doubles in cost, so you can investigate and adjust your query to maximize cost efficiency. Get started by enabling our out-of-the-box DBM monitors, or learn more about configuring DBM monitors in our documentation.

Alert on waiting and long running queries using DBM monitors.