Integrate Istio Service Mesh With Datadog | Datadog

Integrate Istio service mesh with Datadog

Author David M. Lentz

Last updated: May 21, 2019

As application architecture moves from monoliths to microservices, observability has become a growing challenge. The services that make up a distributed application, and the many dependencies and communication pathways between them, are difficult to govern and observe. You can get more control and visibility of your application by including a service mesh—a layer of infrastructure that manages traffic among microservices.

Istio is an open source service mesh that makes it easy to secure, configure, and monitor the services that make up an application. With Istio as part of your application, you can collect metrics that help you understand how your service mesh is performing. To help you visualize performance data from your service architecture, we’re pleased to announce Datadog’s new Istio integration.

A visualization based on the Istio integration shows cluster, network, and host metrics.

Istio’s internals

Istio routes traffic among microservices, and it allows operators to dynamically apply rules that govern the network’s behavior. Istio is comprised of a data plane, which is the infrastructure that manages the traffic to and from the services, and a control plane, which provides the means of configuring and monitoring the data plane.

Planes and clouds

Istio’s data plane is Envoy, an open source proxy created by Lyft. Istio deploys one Envoy proxy for each service in the mesh. This is known as a sidecar pattern: each service talks only to its paired Envoy proxy, which routes messages to and from other services in the mesh, subject to rules applied via the control plane.

Using Istio’s control plane, an application’s operator can dynamically configure sidecars, coordinating their behavior to provide capabilities like load balancing, authentication, and service discovery. Because Envoy handles these concerns, application developers don’t need to build network awareness into the services, which reduces the complexity of the service code.

Istio is platform independent. Istio 1.1—the current version—supports applications on Kubernetes, or on Nomad (for service scheduling) with Consul (for service discovery). Because it works with Kubernetes, you can use Istio with managed Kubernetes services offered by major cloud providers, including Google (GKE), Amazon Web Services (EKS), and Azure (AKS).

Collecting metrics

Istio uses adapters to connect to infrastructure backends—external services that extend your application with functionality like authentication, logging, and monitoring. A Prometheus adapter is enabled by default, and once you’ve configured Datadog’s Istio integration, the Datadog Agent automatically begins collecting metrics from Istio’s Prometheus endpoint. To visualize custom metrics from your service mesh in Datadog, you can also configure Istio’s Datadog adapter.

Envoy proxies all traffic to and from the services within the application. Each time an Envoy sidecar processes a request, it sends metadata to Istio, including the request’s size, time, source, and destination. Istio then sends these metadata values—known as attributes—to the Prometheus and Datadog adapters so the Agent can submit metrics to your Datadog account.

Visualizing Istio’s performance

Once you’re collecting Istio metrics, you can create a dashboard to visualize data about the requests made by your services, your network activity, and details of Istio’s resource consumption.

An Istio dashboard shows request count and latency, and the resource usage of the hosts in the mesh.
An Istio dashboard shows throughput and resource usage. You can customize this dashboard to show other metrics and APM data to monitor your entire application in a single view.

You can easily collect metrics from Istio’s related technologies, including Kubernetes and Envoy, and logs from your services. You can combine all this data in a custom Datadog dashboard to monitor your entire application. You can even include APM and distributed tracing data from your service mesh.

Tracing requests

As of version 1.1.3, Istio includes support for Datadog APM and distributed tracing so you can visualize the path and latency of requests as they travel across your service mesh. Datadog APM includes support for Python, Go, Node.js, and many other languages. If you’ve written your microservices in any of these languages, you can begin viewing traces as soon as you’ve configured Datadog APM and the Istio integration.

APM provides deep visibility into your distributed applications so you can identify the source of any latency or errors that may affect your users' experience. The flame graph is a visualization that displays the service calls that were executed to fulfill a request. The duration of each service call is represented by the width of the span, and in the sidebar, you can see the services called and the percent of time spent on each. You can click any span to see further information, such as metadata and error messages.

A view of a trace submitted from an Istio service mesh shows several spans. A table on the right lists the services called and the percent of time spent on each.

Note that in several spans, envoy.proxy precedes the name of the resource (which is the specific endpoint to which the call is addressed, e.g., main-app.apm-demo.svc.cluster.local:80). This is because Envoy proxies all requests within an Istio mesh. This architecture also explains why envoy.proxy spans are generated in pairs: the first span is created by the sidecar proxying the outgoing request, and the matching second span is from the sidecar that receives it.

Along with other APM features like App Analytics and the Service Map, flame graphs can help you troubleshoot and investigate errors in your Istio mesh. In the next screenshot, we see that the reviews.default service has executed in 387 microseconds and returned the error code 500.

A flame graph shows a single span. The bottom of the page shows graphs displaying host metrics, including CPU usage and load averages.

With Datadog APM you can see exactly where an error originates, and use the tabs below the flame graph—Span Metadata, Host, Logs, and Error—to see related information that can help you better understand the span you’re inspecting.

For more information about monitoring your distributed services with APM, see our documentation.

Monitor your service mesh with Datadog

Datadog’s new Istio integration—one of over 400 integrations available in Datadog—can help you increase visibility into your distributed application by monitoring all the components of your service mesh. If you’re running Istio in a Kubernetes cluster, you can add the Agent as a DaemonSet to ensure it’s collecting metrics from all your nodes. If you configure the Agent’s Autodiscovery capabilities, you’ll continue to collect metrics even as your containers come and go.

For more information, see our 3-part series on monitoring Istio. If you’re not already using Datadog, you can start today with a full-featured, .