Monitor your Istio service mesh with Datadog
As application architecture moves from monoliths to microservices, observability has become a growing challenge. The services that make up a distributed application, and the many dependencies and communication pathways between them, are difficult to govern and observe. You can get more control and visibility of your application by including a service mesh—a layer of infrastructure that manages traffic among microservices.
Istio is an open source service mesh that makes it easy to secure, configure, and monitor the services that make up an application. With Istio as part of your application, you can collect metrics that help you understand how your service mesh is performing. To help you visualize performance data from your service architecture, we’re pleased to announce Datadog’s new Istio integration.
Istio routes traffic among microservices, and it allows operators to dynamically apply rules that govern the network’s behavior. Istio is comprised of a data plane, which is the infrastructure that manages the traffic to and from the services, and a control plane, which provides the means of configuring and monitoring the data plane.
Planes and clouds
Istio’s data plane is Envoy, an open source proxy created by Lyft. Istio deploys one Envoy proxy for each service in the mesh. This is known as a sidecar pattern: each service talks only to its paired Envoy proxy, which routes messages to and from other services in the mesh, subject to rules applied via the control plane.
Using Istio’s control plane, an application’s operator can dynamically configure sidecars, coordinating their behavior to provide capabilities like load balancing, authentication, and service discovery. Because Envoy handles these concerns, application developers don’t need to build network awareness into the services, which reduces the complexity of the service code.
Istio is platform independent. Istio 1.1—the current version—supports applications on Kubernetes, or on Nomad (for service scheduling) with Consul (for service discovery). Because it works with Kubernetes, you can use Istio with managed Kubernetes services offered by major cloud providers, including Google (GKE), Amazon Web Services (EKS), and Azure (AKS).
Istio uses adapters to connect to infrastructure backends—external services that extend your application with functionality like authentication, logging, and monitoring. A Prometheus adapter is enabled by default, and once you’ve configured Datadog’s Istio integration, the Datadog Agent automatically begins collecting metrics from Istio’s Prometheus endpoint. To visualize custom metrics from your service mesh in Datadog, you can also configure Istio’s Datadog adapter.
Envoy proxies all traffic to and from the services within the application. Each time an Envoy sidecar processes a request, it sends metadata to Istio, including the request’s size, time, source, and destination. Istio then sends these metadata values—known as attributes—to the Prometheus and Datadog adapters so the Agent can submit metrics to your Datadog account.
Visualizing Istio’s performance
Once you’re collecting Istio metrics, you can create a dashboard to visualize data about the requests made by your services, your network activity, and details of Istio’s resource consumption.
You can easily collect metrics from Istio’s related technologies, including Kubernetes and Envoy, and logs from your services. You can combine all this data in a custom Datadog dashboard to monitor your entire application. You can even include APM and distributed tracing data from your service mesh.
As of version 1.1.3, Istio includes support for Datadog APM and distributed tracing so you can visualize the path and latency of requests as they travel across your service mesh. Datadog APM includes support for Python, Go, Node.js, and many other languages. If you’ve written your microservices in any of these languages, you can begin viewing traces as soon as you’ve configured Datadog APM and the Istio integration.
APM provides deep visibility into your distributed applications so you can identify the source of any latency or errors that may affect your users’ experience. The flame graph is a visualization that displays the service calls that were executed to fulfill a request. The duration of each service call is represented by the width of the span, and in the sidebar, you can see the services called and the percent of time spent on each. You can click any span to see further information, such as metadata and error messages.
Note that in several spans,
envoy.proxy precedes the name of the resource (which is the specific endpoint to which the call is addressed, e.g.,
main-app.apm-demo.svc.cluster.local:80). This is because Envoy proxies all requests within an Istio mesh. This architecture also explains why
envoy.proxy spans are generated in pairs: the first span is created by the sidecar proxying the outgoing request, and the matching second span is from the sidecar that receives it.
Along with other APM features like Trace Search & Analytics and the Service Map, flame graphs can help you troubleshoot and investigate errors in your Istio mesh. In the next screenshot, we see that the
reviews.default service has executed in 301 microseconds and returned the error code
This simplified flame graph shows only that a 500 Internal Server Error occurred, but in the case of a service call that involves multiple resources, you can easily spot errors that can cause cascading failures. When you’re inspecting an error span, you can use the tabs at the bottom of the page—Span Metadata, Host, Logs, and Error—to pivot to view related data to help you understand the error and resolve the issue.
For more information about monitoring your distributed services with APM, see our documentation.
Monitor your service mesh with Datadog
Datadog’s new Istio integration—one of over 250 integrations available in Datadog—can help you increase visibility into your distributed application by monitoring all the components of your service mesh. If you’re running Istio in a Kubernetes cluster, you can add the Agent as a DaemonSet to ensure it’s collecting metrics from all your nodes. If you configure the Agent’s Autodiscovery capabilities, you’ll continue to collect metrics even as your containers come and go.
If you’re not already using Datadog, you can start today with a full-featured, free two-week trial.