Scaling Reliability Across a Global Connected Platform
Signify’s Hue Connected platform is built to make home lighting more enjoyable, personal, and inspiring, while increasingly supporting security-focused use cases. The platform operates at a massive scale, supporting millions of connected devices and users worldwide.
The environment spans cloud-based services, embedded software running on Hue Bridges, and mobile applications. It processes approximately 45,000 incoming requests per second, 10,000 outgoing events per second, and 3,000 internal bridge-to-cloud requests per second, operating continuously across 21 clusters in 11 regions worldwide. “At this scale, reliability is a business requirement, not just a technical one,” says Leon Bouwmeester, Head of the Hue Platform Cluster at Signify. “We need to understand how the platform behaves globally, all the time.”
As the platform expanded, observability was handled independently by teams using different tools across AWS, GCP, and on-prem environments. While this worked for individual services, it made it difficult to understand end-to-end request flows across microservices, embedded systems, and user interactions. “When incidents happened, teams often saw only their piece of the puzzle,” explains Dmitry Korolev, Engineering Manager at Signify. “That slowed diagnosis and increased the time it took to restore services.”
This fragmentation increased Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR), consumed valuable engineering time, and increased risk to customer experience and delivery velocity.
From reactive to proactive operations
After a formal RFP and proof-of-value process, Signify selected Datadog for its total cost of ownership, strong documentation and support, and ability to scale adoption over time.
Datadog provides a single operational layer across infrastructure, applications, logs, user experience, delivery pipelines, AI systems, and security signals. Teams can navigate seamlessly between metrics, traces, logs, CI pipelines, and runtime behavior, enabling faster understanding and decision-making. “For us, the value comes from everything being connected,” says Korolev. “That context dramatically shortens the path from detection to understanding.”
With Datadog, Signify continuously monitors connectivity and performance across Hue Bridges and cloud services using Infrastructure Monitoring and APM. During a major power outage affecting parts of Portugal and Spain in early October 2025, Datadog alerted teams before the event was widely reported. “That early signal gave us time to coordinate internally and inform customer support proactively,” says Bouwmeester. “We were not reacting to tickets. We were ahead of the situation.”
“For us, the value comes from everything being connected. That context dramatically shortens the path from detection to understanding.”
Service Level Indicators and Service Level Objectives are tracked in Datadog using Dashboards and Monitors and integrated into internal status pages, providing real-time visibility into platform health. Datadog On-Call enables teams to collaborate quickly during incidents by sharing dashboards, traces, and logs in a single context.
Signify uses Datadog CI Visibility and Datadog CI Tests to monitor complex IoT pipelines involving cross-compilation, emulators, real hardware testing, and multi-layer security scans. By visualizing pipeline performance and test execution, teams identified bottlenecks and reduced waiting time. “For developers, waiting in CI directly affects productivity,” says Korolev. “Having clear visibility helped us remove friction and improve throughput.”
Unifying observability, AI, and security
As part of its customer experience, Signify deployed an AI agent live within its applications that answers customer questions and allows consumers to interact with their lighting system. Because the agent directly affects customer experience, understanding how the model behaves in production is essential.
Signify uses Datadog LLM Observability to monitor the agent’s behavior by tracking prompts, responses, latency, and errors alongside application, infrastructure, and user experience signals. “AI is now part of our production platform,” says Bouwmeester. “Datadog LLM Observability helps us understand how the model behaves in real usage, not just in testing.”
“AI is now part of our production platform. Datadog LLM Observability helps us understand how the model behaves in real usage, not just in testing.”
With Datadog already widely adopted across engineering, adding security capabilities was a natural next step. Without a dedicated security team, Signify needed tools that engineers could adopt easily without false positives or operational disruption. Signify adopted Datadog Code Security, Application Security, and Cloud Security Posture Management, integrating security directly into CI pipelines and runtime monitoring. “We wanted security to fit naturally into how teams already work,” explains Korolev. “Datadog made that possible because adoption was straightforward and familiar.”
“We wanted security to fit naturally into how teams already work. Datadog made that possible because adoption was straightforward and familiar.”
Consolidating observability, delivery, AI, and security into a single platform also allowed Signify to retire multiple point solutions, simplifying procurement and reducing operational overhead.
As a result, Signify has significantly improved reliability and operational efficiency across its global platform. Availability has increased to 99.97%, while teams are able to detect and resolve issues faster, reducing MTTR across services. At the same time, the platform has scaled rapidly, handling more than 10x growth in traffic while maintaining consistent performance. Improved visibility into system behavior and usage has also enabled more efficient resource utilization, contributing to a reduction in cloud costs of over 60%.
As Signify continues to scale its connected lighting and security platform, Datadog provides it the foundation to operate with confidence. By unifying observability, delivery metrics, AI monitoring, and security, Signify reduces operational risk while maintaining the speed required to innovate. “Having a single, connected view across the platform gives us confidence as complexity grows,” says Bouwmeester. “Datadog allows us to scale safely while continuing to deliver great experiences for our customers.”
“Datadog allows us to scale safely while continuing to deliver great experiences for our customers.”