Roller | Datadog
Roller

Case Study

ROLLER scales global attractions platform by monitoring and securing AWS with Datadog 

About Roller

ROLLER develops and provides software for attractions and leisure businesses. Its platform helps operators manage high-volume, real-time transactions including online bookings, point-of-sale, digital waivers, memberships, and guest check-ins, all from a single interface.

Software Development
~200 Employees
Melbourne, Australia
“Datadog is a system of intelligence. It gives us intelligent decision-making that enables our small team to continue to support thousands of attractions businesses worldwide.”
case-studies/roller/sean-fernandez
“Datadog is a system of intelligence. It gives us intelligent decision-making that enables our small team to continue to support thousands of attractions businesses worldwide.”
Sean Fernandez CIO ROLLER

Why Datadog?

  • Deep visibility into performance of 100+ AWS services in one view
  • Quick time-to-value with out-of-the-box dashboards and prebuilt alerts
  • Faster root-cause analysis with one-click telemetry correlation and consistent tags
  • Intuitive interface for technical and non-technical team members
  • Database monitoring enables quick remediation without dedicated database administrators
  • Security tools help prioritize application vulnerabilities and improve firewall rules

Challenge

ROLLER needed unified observability and security for its most critical AWS workloads to improve the availability of its mission-critical platform, reduce resolution times, and collaborate better across operations, engineering, and security teams.

Key Results

100% SLA compliance

Meeting or beating SLAs while supporting more than 2,200 attractions

↓ 99% MTTR

Time saved resolving issues, from 3 days to 15 minutes

↓ 60% EC2 costs

By using Datadog to monitor infrastructure change to ECS and EC2 Spot Instances

↑ 4x

Increase in scale with same team size

Maintaining critical systems to support a global platform

ROLLER provides software to over 2,200 attractions businesses across more than 35 countries. Its cloud-based platform includes point-of-sale systems, digital waivers, online stores, marketing tools, and search engine optimization features designed to help attractions grow and deliver better guest experiences. Each year, ROLLER processes over $100 million in transactions and supports millions of guest experiences. “We provide a mission-critical system for operators,” says Sean Fernandez, CIO at ROLLER. “It’s imperative that our systems remain highly available so our customers can continue to conduct transactions.” 

The company previously used multiple monitoring tools to manage its critical systems. However, as ROLLER grew, this approach created challenges. The SRE team struggled to quickly understand performance across cloud regions, accounts, environments, and subsystems. This lack of full visibility and telemetry correlation in one view made it difficult to maintain service level agreements (SLAs) and respond quickly to problems. They also struggled with aggregating logs from different systems, which slowed troubleshooting. Additionally, as infrastructure compute costs rose, they lacked the full visibility needed to modernize their AWS infrastructure and CI/CD without compromising availability and growth goals.

The Roller Team

Simplifying cloud monitoring with observability and security

After evaluating several monitoring options, ROLLER chose Datadog as its observability platform. While some competitors offered lower prices initially, ROLLER found that Datadog better matched its evolving needs as it matured, while Datadog’s prebuilt dashboards for AWS and other key technologies helped teams get visibility quickly. In addition, setting up Datadog was simple, and alerts worked correctly from day one. 

Log aggregation and correlation, previously a major pain point, quickly became more manageable after implementing Datadog. “Aggregating logs and getting them into a state where developers can actually see and correlate them across the board is always tricky,” says Shane Burham, site reliability engineer at ROLLER, who now leads the security team. “Having logs present on the traces now gives us much better visibility.”

Today, ROLLER uses Datadog in several critical ways. Datadog enables non-technical teams to monitor AWS-hosted apps and infrastructure by using intuitive dashboards. When issues arise, these dashboards provide enough information for support staff to quickly determine whether an issue is platform-wide or customer-specific. From there, they can use Datadog to easily identify the relevant team based on where the issue originated.

Having access to database monitoring has been another significant benefit. Without dedicated database administrators, ROLLER needed a way for application developers to identify and fix SQL Server database issues themselves. Datadog gives them visibility into whether problems stem from queries, disk performance, or application code, enabling any team member to find and address the root cause. 

Meanwhile, security monitoring with Datadog’s Application and API Protection has helped ROLLER detect, prioritize, and address application vulnerabilities. The solution has given them visibility into security issues that bypass their AWS Web Application Firewall (WAF), enabling them to write better security rules and protect their applications more effectively.

“Aggregating logs and getting them into a state where developers can actually see and correlate them across the board is always tricky. Having logs present on the traces now gives us much better visibility.”

Unified observability derisks changes to infrastructure and CI/CD processes

Perhaps most importantly, Datadog has supported ROLLER’s architectural evolution on AWS.     To deliver long-term growth while meeting SLAs and improving operational efficiency, the ROLLER team used Datadog to monitor modernization initiatives across infrastructure and CI/CD. The ROLLER team was able to evolve its infrastructure from Windows EC2 instances to a blend of EC2 Spot Instances and containers using AWS Fargate and Amazon ECS. These changes reduced the time to spin up new compute by 77 percent and reduced compute costs by 60 percent. With Datadog, ROLLER had the visibility needed to execute these changes while fully mitigating risks to availability. “Datadog derisked taking modernization approaches on AWS,” says Fernandez. “Observability is about reducing risk. You can take a much bigger risk building infrastructure if you have a really good observability platform over the top. Datadog helped us understand whether our customers were still being serviced properly as we made changes to our infrastructure and CI/CD processes.” 

Today, ROLLER has comprehensive visibility into its AWS-hosted applications and infrastructure in one tool that can be used effectively by operations, engineering, security, and support teams. This consolidated visibility has enabled ROLLER teams to meet and exceed SLAs, speak one common language, and cost-effectively support more attractions businesses over the long term.

The ability to correlate events across the entire technology stack has also improved incident response. “When we have an issue, the most useful thing is not just to look at what’s broken—it’s to look at the entire stack, because you’ll always end up missing the forest for the trees with the number of events that can occur from one single issue,” says Fernandez. “With Datadog, you can observe it in a platform that correlates everything in 30 seconds versus clicking around a bunch of windows, which would take hours.”

Datadog has also enabled ROLLER to take a shift-left approach, empowering their SRE, engineering, and security teams to take more ownership of observability. “With Datadog, we have everything in a single platform,” adds Burham. “Anyone can read it. And it has the context that’s relevant to that particular specialty. We’re all talking the same language.”

Finally, Datadog has enabled ROLLER’s small team to accomplish more. With just a four-person SRE team, they’ve been able to quadruple their scale while simultaneously derisking operations and supporting more complex systems. 

“Datadog is a system of intelligence,” says Fernandez. “It gives us intelligent decision-making that enables our small team to continue to support thousands of attractions businesses worldwide.”

Resources

dg/aws-integrations-solution-brief-formheader

guide

Accelerate Cloud Modernization with AWS and Datadog
gated-asset/aws_ebook_containers_thumbnail

ebook

Containerized applications in AWS
blog/ecs-explorer/il-1492_ecs-explorer_feature_announcement_hero_241121_final

BLOG

Gain comprehensive visibility into your ECS applications with the ECS Explorer