Axis Communications unifies telemetry data and reduces troubleshooting time with Datadog Serverless monitoring | Datadog
Axis Communications unifies telemetry data and reduces troubleshooting time with Datadog Serverless monitoring

case study

Axis Communications unifies telemetry data and reduces troubleshooting time with Datadog Serverless monitoring

About Axis

Axis Communications is an industry leader in video surveillance. The company develops and supplies innovative, mission-critical managed surveillance solutions that improve security, safety, operational efficiency, and business intelligence for customers across the world.

Video Surveillance
4000+ Employees
Lund, Sweden
“Datadog is a super important tool that enables us to make something that is reliable and that our customers can trust.”
case-studies/axis-jon-lindeheim
“Datadog is a super important tool that enables us to make something that is reliable and that our customers can trust.”
Jon Lindeheim Engineering Manager Axis Communications

なぜDatadogなのか?

  • Unified platform eliminates the need for multiple monitoring tools
  • Ability to aggregate telemetry data across multiple cloud providers
  • Built-in incident management features streamlined operations
  • More inclusive and collaborative tool across departments

Challenge

Axis Communications faced challenges in efficiently monitoring its hybrid cloud platform due to fragmented observability tools, leading to security risks and operational inefficiencies when troubleshooting across numerous microservices.

Key Results

Hours → minutes

Reduces troubleshooting time and MTTR

Reactive → proactive

Enables fast identification of potential issues

Single source of truth

Ability to correlate data and insights into key business services and devices

Enables data-driven decisions

Based on real-time data about latency and call duration

Complex cloud infrastructure makes visibility challenging

Axis Communications is the industry leader in video surveillance. The company’s millions of cameras are used by various types of companies for many different applications, including protecting critical infrastructure, data centers, and educational facilities worldwide. These mission-critical, media-rich applications have very low tolerance for delay or downtime. 

To support its vast ecosystem of customers and partners, Axis recently launched Axis Cloud Connect—a hybrid cloud platform designed to enable its partners to build innovative solutions using Axis devices. As Axis Cloud Connect grew, it became more difficult to observe how the entire platform performed. “Since we were a pure DevOps shop, each team was responsible for monitoring their own services using the tools available at the time,” says Jon Lindeheim, engineering manager at Axis Communications.

The team previously used a couple of different open-source observability solutions. According to Lindeheim, there were two major challenges with this setup. “First, when triaging a problem that involved many microservices, a person needed to investigate by logging in to many different AWS accounts or Azure subscriptions and manually reconcile to find dashboards or logs that could point them in the right direction,” he says. 

That method took time and created a security risk, since Axis needed to grant access to production environments to physical users. Secondly, it took time and effort to maintain the different toolchains for observability visualization. “That was time that we would rather spend developing value for our customers,” Lindeheim says.

Lindeheim and his team sought a solution that could provide broad observability for the entire platform.

Image for Complex cloud infrastructure makes visibility challenging

Gaining end-to-end visibility into the health of serverless applications

Axis was familiar with Datadog, as several teams across the company had already been using the platform successfully. When Lindeheim’s team began evaluating potential solutions, Datadog emerged as a natural choice for their observability needs. 

“We soon realized that we got a lot more valuable data by instrumenting with the Datadog Agent directly,” says Lindeheim. “It was easy to start collecting telemetry.”

After a few tuning proof of concepts, Lindeheim’s team gained control over their retention filters and log pipelines, which meant they could keep the cost equivalent to what they had with their previous solutions but with a lot more value. “The obvious value was that all telemetry was now aggregated in the same place,” says Lindeheim.

Lindeheim’s team initially used the aggregated data to build a notebook for one of their most mission-critical use cases. “By attaching a URL to this notebook in our OpsGenie incident, the on-call staff could easily find where the problem was for this specific use case among the services involved."

Axis is also using Datadog Serverless Monitoring to provide end-to-end visibility into the health of its serverless applications, reducing MTTR. Datadog Serverless Monitoring enables teams to stay agile and focus their time building revenue-generating applications while reducing operational overhead. Visibility into metrics, traces, and logs for every invocation of serverless applications allows their teams to deploy new code with confidence. “Another interesting insight we gained was that Lambda deployment rollouts took much longer than we expected. This was one of many insights that we discovered about our serverless infrastructure,” says Lindeheim. “It was like opening the hood of a car for the first time.”

Reducing troubleshooting from hours to minutes

Today, Axis Communications uses Datadog daily to unify telemetry data across teams and cloud providers and reduce troubleshooting time and MTTR from hours to minutes. “Ever since we started using Datadog, the MTTR decreased significantly,” says Lindeheim.

The Datadog platform also ensures Axis has a more inclusive and collaborative tool they can use across departments. “Product owners, QA, managers, and product specialists can use this data to verify test results, make decisions, or assist customers in support cases,” adds Lindeheim. 

They also get fast feedback on new functionality. “We can see potential problems from traffic patterns before they become real problems,” says Lindeheim. “That means we can be proactive instead of reactive. With Datadog we know everything functions correctly after each deployment, so the amount of errors do not increase. Datadog helps us to see when the problems start before they hit the client.”

Most importantly, the time it takes to troubleshoot a failing chain of requests now takes an average of five minutes instead of several hours. “That speed is appreciated by the developers and engineers. Any one of our engineers can view the full traces using Datadog and don’t have to wait for another team to investigate their service telemetry,” says Lindeheim. 

Lindeheim says he expects that Datadog will also help Axis as they begin some larger migration work in the near future, moving workloads to different infrastructure and re-architecting some larger systems. “Datadog will help us validate that these changes don’t affect our customers negatively,” he says. “It feels safe now that Datadog is there with a single pane of glass.”

Ultimately, Axis can now focus on continuing its rollout of Axis Cloud Connect. “We believe that millions more devices will be connected to the cloud in the future, so Axis Cloud Connect is a very prioritized area within the company,” says Lindeheim. “Datadog is a super important tool that enables us to make something that is reliable and that our customers can trust. It has given us the kind of observability we didn’t have before in any other product that we used. We couldn’t live without it right now.”

リソース

products/cloud-cost-management/product-hero-desktop

official docs

Serverless Monitoring for AWS Step Functions
blog/datadog-next-gen-lambda-extension/next-gen-lambda-hero

BLOG

Introducing Datadog's Next-Generation Rust-based Lambda Extension
blog/instrument-cloud-run-with-datadog-sidecar/security-coverage-hero

BLOG

Instrument Google Cloud Run applications with the new Datadog Agent sidecar
blog/engineering/datadog-lambda-extension-rust/eng-blog-datadog-lambda-extension-rust-hero

BLOG

Squeezing every millisecond: How we rebuilt the Datadog Lambda Extension in Rust