SRE team looks to enhance observability and automation capabilities
ArisGlobal is an AI-first life sciences company that specializes in solutions that drive drug development, safety monitoring, and regulatory compliance. The company works with hundreds of global life sciences organizations, Contract Research Organizations (CROs), and government health authorities to enhance workflow efficiencies, reduce costs, and improve patient safety.
The pharmaceutical industry faces major challenges today, including growing case volumes and regulatory complexities, supply chain disruptions, budgetary and resource constraints, and soaring R&D costs. ArisGlobal built LifeSphere—an interoperable platform across Safety, Regulatory, Quality, and Medical Affairs—to help its customers overcome these challenges. LifeSphere NavaX is the company’s next-generation cognitive computing engine that uses the latest technology, including AI and GenAI, to automate core functions, extract data, and generate narratives.
Within ArisGlobal, the Site Reliability Engineering team is a small but critical group primarily tasked with ensuring availability, monitoring, and automation. Their primary measure of success is meeting Service Level Objectives (SLOs) for the LifeSphere platform. However, the team recognized that their existing observability tool didn’t have the capabilities they needed to support their mission effectively. “Observability was always a challenge,” says Rajkamal Madhaiyan, Site Reliability Engineer at ArisGlobal. “We knew that by improving our observability, we would be better able to right-size our environments to match increased demand for our products.”
“We knew that by improving our observability, we would be better able to right-size our environments to match increased demand for our products.”
Enabling comprehensive app visibility
ArisGlobal selected Datadog to address its observability and operational challenges and support its life sciences application. “The solutions around Application Performance Monitoring (APM) that Datadog offered were exactly what we were looking for to give us more insight into the state of our applications,” explains Madhaiyan.
“The solutions around Application Performance Monitoring (APM) that Datadog offered were exactly what we were looking for to give us more insight into the state of our applications.”
The company implemented Datadog across its infrastructure, with primary deployment across key environments and selective use in its performance testing and quality control environments. When needed for specific troubleshooting scenarios, they also deploy Datadog in development environments.
Today, ArisGlobal uses several key Datadog capabilities to help maintain its high availability standards. For example, through codespan tagging, engineers use APM to diagnose and troubleshoot performance issues on a user-by-user basis, providing granular visibility into application behavior. “APM is a great product,” notes Madhaiyan. “Being able to see the time taken for individual requests and being able to give that information back to the development team has been invaluable.”
Ensuring operational efficiency and streamlining remediation
ArisGlobal also uses Datadog’s Automation suite, which includes Workflow Automation, App Builder, and Datastore, to optimize operations and simplify remediation.
To optimize operations, the company uses App Builder to create self-service applications that allow controlled access to specific product and infrastructure features. “Datadog is a critical tool for our organization for troubleshooting, performance analysis, uptime tracking, and issue remediation through the use of automation,” says Madhaiyan.
When incidents occur, ArisGlobal’s team is notified through Datadog On-Call, and the event is automatically recorded in Incident Management. From there, they rely on Datadog’s automation capabilities to simplify remediation and reduce manual effort. A Workflow Automation button embedded directly into their dashboards allows SREs to quickly restart or remediate services during an incident, triggering standardized runbooks without the need to switch tools.
Building on that, the team uses App Builder to create self-service interfaces that let them stop and start services during deployments, perform rolling or full restarts, and run maintenance activities. This is all done while enforcing strict access controls so that users are only granted the permissions required for specific actions. Datastore adds another layer of reliability and visibility by persisting service states for clean restarts, keeping a centralized log of all maintenance windows and powering dashboards that provide leadership with up-to-date SLO reporting.
ArisGlobal has also extended these capabilities with custom apps, including a server inventory tool that queries VM details across its cloud environments and a scheduled maintenance app that tracks and displays active or upcoming maintenance across product dashboards. Together, these tools enable the SRE team to remediate incidents faster, reduce operational overhead, and maintain clearer visibility into the health and performance of their services.
Automation has delivered particularly impressive results. The ability to prototype automations has enabled the SRE team to experiment with scripts that previously required manual runbooks to execute. “Automation continues to be a great addition that we are always finding new ways to use. Being able to create a button that remediates an issue that you can see on a dashboard and placing that button right next to the problem widget has decreased our time to remediation significantly,” notes Madhaiyan. “Workflow Automation enables us to implement remediation actions directly from the respective dashboards or automatically from monitors.”
“Being able to create a button that remediates an issue that you can see on a dashboard and placing that button right next to the problem widget has decreased our time to remediation significantly.”
Enhanced monitoring and automation improve app performance
ArisGlobal now has a new primary tool for monitoring application performance and reliability across the organization. “The availability and performance of our products is key to our customers’ success,” explains Madhaiyan. “With Datadog, we track observability metrics and understand customer impact.”
The shift to Datadog Workflow Automation also delivered significant cost savings and adoption benefits. Previously, ArisGlobal had invested in a separate automation platform that was costly and time-consuming to implement, taking nearly a year to set up. Despite the investment, the team struggled to get widespread adoption because it required learning and maintaining an entirely separate system. By consolidating automation within Datadog (a platform the team was already using daily) ArisGlobal eliminated the standalone licensing costs while dramatically improving usage. Datadog’s pay-as-you-go model and integration meant embedding automation buttons directly into existing dashboards, making it far easier for the team to create and use automations.
Looking ahead, ArisGlobal plans to continue expanding their Datadog usage. “As Datadog continues to grow their offerings, we will continue to evaluate them on a regular basis,” notes Madhaiyan. “The work being done around the database performance monitoring and AI monitoring tools is compelling.”