OpenTelemetry Readiness Checklist

Timeline

Host Attribution
- Different language runtimes detect and report host identity differently. A Java application may resolve host.id while a Node.js application resolves host.name, meaning the same physical host can appear as multiple distinct hosts downstream. Note that the OTel semantic conventions define both host.name and host.id as optional resource attributes, so there is no guarantee that every SDK populates both. Audit for consistency across your entire service fleet before signals reach any ingestion endpoint.
- Container IDs and pod names are unique per instance and ephemeral. If used as host identifiers, each new deployment can register as a new host in your observability platform. Instrument an internal audit metric, such as a count of distinct host.name values per cluster, to surface attribution anomalies early.
- The OTel ResourceDetection processor queries cloud provider metadata endpoints (AWS IMDS, GCP metadata server, Azure IMDS) and Kubernetes APIs on startup to reliably identify the underlying host. Because detection runs only at startup and not continuously, it does not introduce ongoing resource consumption. This approach is consistently more accurate than relying on the OTel SDK alone, since language runtimes have no privileged access to infrastructure-level identity.
- In mixed-cloud Kubernetes environments, different cloud providers report host identity in different formats and with different levels of completeness. A recommended convention is <k8s.node.name>-<k8s.cluster.name>, enforced via Transform processors in your daemonset collectors. Apply this conditionally: only set this convention when both k8s.node.name and k8s.cluster.name are present; otherwise fall back to cloud provider identity attributes (host.id, host.name) so your platform's own resolution logic can take over.
- Some observability platforms support host alias attributes that allow a single host to be recognized under multiple known names simultaneously — for example, a node name assigned by your OTel pipeline (host_aaa.my_cluster) and a hostname returned by a cloud API scrape (host_aaa.internal). This capability became available in the OTel Collector from v0.128.0+. Without aliasing, OTel signals and cloud integration signals for the same host can appear as separate, unrelated entities, inflating monitored host counts and breaking infrastructure correlation. Check whether your target platform supports alias reconciliation and configure it accordingly.
Metric Correctness
- The OTel specification supports both cumulative and delta aggregation temporality, and the OTLP exporter defaults to cumulative for all instrument kinds. Different backends have different requirements. Some platforms work best with delta temporality for monotonic sums, histograms, and exponential histograms, while others require cumulative temporality. Configure your SDK or exporter to match your backend's expectations. If you need to support multiple backends with different preferences, the OTel Collector's cumulative-to-delta processor can convert in the pipeline.
- Observability platforms that enforce a last-write-wins model will silently discard earlier submissions when two collectors or processes submit the same metric with identical tags within the same flush window. This is especially risky during migrations where dual-shipping is in effect. Audit your pipeline for duplicate writers before any cutover, and enforce a clear single-writer boundary per metric namespace.
- OTel metrics do not carry a schema field declaring their collection interval. Without this context, a downstream platform cannot correctly interpolate graphs when zoomed in, and count metrics may be rendered as rates, producing misleading visual artifacts. This contrasts with collection agents that default to a fixed flush interval (commonly 10 seconds), giving the platform an implicit interpolation baseline. Set collection intervals explicitly for each custom metric in your pipeline configuration to ensure consistent graph rendering at any zoom level.
- High-cardinality tags create a unique time series for each distinct value. A tag like container_id with thousands of values per hour creates thousands of distinct metric series, exponentially increasing your storage footprint and cost. Tags should represent stable, bounded dimensions: service name, environment, region, and similar categorical properties. Ephemeral identifiers belong in traces, not metrics.
- Most platforms apply opinionated transformations to incoming OTel metric names. If your existing dashboards, monitors, or alerts reference metric names from a previous instrumentation approach, they may break after migration unless a cross-query compatibility layer is configured. Review the platform's OTel metric mapping documentation before cutting over and validate critical monitors against both naming schemes during a parallel-run period.
Trace and Span Quality
- Most observability platforms and Collector components that compute RED metrics use span.kind to determine which spans qualify as service-entry spans. Typically, spans with span.kind of server, producer, consumer, or client generate trace metrics, while internal spans do not. Root spans (the first span in a service's participation in a trace) are generally treated as service-entry spans regardless of kind. Validate that your framework instrumentation is setting span.kind correctly on service-entry spans, or RED metrics will have coverage gaps.
- OTel SDKs drop unsampled spans entirely before they reach the Collector, which means downstream components cannot include them in trace metric calculations. Any sampling decision made at the SDK level directly reduces the span population available for RED metric computation. Moving sampling decisions downstream to the Collector level ensures trace metrics are computed from 100% of traffic, not just the sampled portion. If SDK-level sampling cannot be fully eliminated, quantify the expected undercount and factor it into your SLO baselines.
- Any component that calculates RED metrics from spans must process all spans before sampling reduces the volume. Placing a sampling processor upstream of your trace metric component is one of the most common and consequential OTel misconfiguration mistakes. It silently causes trace metrics to reflect only the sampled traffic fraction. Review your pipeline ordering and enforce this as a configuration linting rule in your deployment process.
Signal Auditability
- Monitor the internal telemetry emitted by your OTel Collector to track the volume and cardinality of resource attributes flowing through your pipeline. Most Collector distributions emit internal metrics (commonly prefixed with otelcol_) about processed spans, metrics, and logs. Build dashboards that track the distinct values of key resource attributes, especially host.name and host.id, to detect attribution anomalies early. An unexpected spike in detected hosts is a leading indicator of misconfiguration, often caused by container IDs, load test hosts, or dynamic values leaking into host identity attributes.
- Attribution errors, such as spans being tagged with dynamically generated hostnames from load tests or auto-scaling events, can cause sudden spikes in billable host counts with no corresponding change in real infrastructure. A threshold or anomaly-based alert on your host count metric provides an early warning system before the impact appears in your bill. This alert should be part of your standard observability platform onboarding checklist, not an afterthought.

This checklist covers signal quality at the pipeline level. Ready to see how your OTel signals look in practice? Start a 14-day free trial of Datadog and send your first OTel traces and metrics in minutes. If you want hands-on guidance for your migration, contact us to connect with our team.

Are your OpenTelemetry signals ready for ingestion?

Timeline

Host Attribution

Do all services emit a consistent host.name or host.id across languages and runtimes?

Have you audited whether container IDs, pod names, or other high-cardinality ephemeral values are being attributed as host names?

Are you running a ResourceDetection processor on daemonset collectors to canonicalize host identity from infrastructure metadata?

If running Kubernetes across multiple cloud providers, do you have a unified host naming convention?

Have you configured host aliases to reconcile OTel-reported host names with names assigned by your cloud integrations?

Metric Correctness

Are all OTel metrics exported using the aggregation temporality required by your backend?

Have you identified every metric name and tag combination to ensure only one system writes it per second?

Have you set explicit collection intervals on your custom metrics?

Have you audited your tag cardinality to avoid pod names, container IDs, or other dynamic attributes as tag values?

Have you reviewed how your observability platform maps OTel metric names to its own naming conventions?

Trace and Span Quality

Are service-entry spans correctly labeled with the appropriate span.kind?

Is OTel SDK head-based sampling minimized in favor of Collector-level sampling?

Is your trace metric connector or processor placed before any sampling processors in your Collector pipeline?

Signal Auditability

Are you monitoring the host counts detected through your OTel signals in real time?

Do you have alerting in place to catch unexpected spikes in detected host counts?