Gaining Infrastructure-Wide Visibility in a Private Openstack Cloud | Datadog
CASE STUDY

Gaining Infrastructure-Wide Visibility in a Private Openstack Cloud

Learn how Revinate leverages Datadog to monitor resource utilization, assess performance, and plan capacity

About Revinate

Revinate is a global, venture-funded company whose software services help over 24,000 hotels around the world enhance their guests' experiences.


Key Results

Improved planning

Revinate is now able to monitor the performance of 400 hosts in real time in order to accurately plan and allocate capacity.

Reduced MTTD

Datadog’s custom dashboards allow Revinate to correlate metrics and events across their system in order to identify issues faster.


Challenge

Revinate was in need of a monitoring tool that could provide complete, uninterrupted visibility across their entire tech stack. Their existing tool only gathered crude metrics once a week, which forced them to manually track trends and plan resources accordingly.


Why Datadog?

Datadog’s intuitive platform and turn-key integrations enabled Revinate to get full visibility into their entire environment, without any arduous setup process. They were also able to create custom dashboards to monitor their hosts and accurately plan capacity, while correlating metrics and events to streamline their troubleshooting process.


Revinate provides software services that help hotels improve the guest experience before, during, and after each stay. The company’s Guest Feedback Suite enables hotels to capture, measure, and optimize the guest experience by bringing together all reviews, survey data, and social media mentions into a single, integrated system. Revinate’s newest platform, inGuest, utilizes innovative mobile technology and rich guest profiles to support targeted marketing campaigns that drive revenue and loyalty. Both systems are built on user-friendly platforms that are custom-designed for the unique needs of the hospitality industry. Revinate is a venture-funded company headquartered in San Francisco with offices worldwide to serve its growing customer base, currently at over 24,000 hotels.

Revinate’s Software-as-a-Service (SaaS) offering is hosted in a Rackspace Private Cloud (RPC) environment that utilizes the OpenStack architecture. Due to the growth of the customer base, the company’s configuration had grown quickly to 25 physical servers supporting 400 virtual instances. Chris Snell, Revinate’s Manager of Technical Operations, says, “With a new platform on the way and lots to manage, my biggest concern in a fast-growing environment was running out of capacity. The point had come that I needed some help.”

The Need: Gain Visibility in the RPC Stack from Top-to-Bottom

OpenStack affords numerous advantages for Revinate, but the RPC offering lacked the robust management capabilities Snell felt he needed. “I was only able to get fairly crude metrics on resource utilization from Rackspace, and that made capacity planning, at best, a guess,” he says. In addition to being insufficient, the available data came infrequently in the form of a weekly email, forcing Snell to track any trends manually.

To address the shortcoming, Snell began evaluating available monitoring and management solutions. “I wanted a tool that would enable me to instrument the entire stack from top-to-bottom, including the applications, all underlying services, the virtual instances and hypervisor, and the physical server resources,” he notes. After being disappointed during several trials, Snell found exactly what he wanted when he tried Datadog.

Datadog Selected for Its Powerful Monitoring Capabilities

“I was immediately blown away by the raw power of Datadog,” recalls Snell. “It’s also quite intuitive and easy to use.” Snell especially appreciates Datadog’s ability to collect and correlate metrics and events across the entire RPC infrastructure, and then use this data to create customized, interactive dashboards that can provide actionable insight to accommodate Revinate’s particular needs. According to Snell, “No other management tool I evaluated came even close to providing such powerful capabilities at such an affordable price point.”

Accurate Measurement of Resource Utilization to Assess Performance and Plan Capacity

Snell’s top priority after installing Datadog was to make a thorough and accurate measurement of resource utilization to assess performance and plan capacity. To create the dashboard, Snell used a combination of standard and custom integrations. The dashboard enables Snell to view the entire stack at-a-glance, and includes custom thresholds to identify any potential issues. For example, Snell set the memory utilization threshold at 80 percent to indicate the need to take action by either reallocating existing memory or adding more to maintain good performance. This single, simple dashboard fulfilled Snell’s need to monitor operations and plan capacity easily, accurately and with complete confidence. “The power provided by Datadog goes well beyond saving time and improving productivity,” says Snell. “These are fundamental operational responsibilities, and the best I could do before installing Datadog was guess.” To Snell’s credit, the dashboard confirmed his guess was pretty good when it revealed average memory utilization was already at 70 percent — uncomfortably close to the 80 percent threshold.

Fast Troubleshooting of Problems Caused by Complex Interactions

After satisfying his high-priority requirement for making accurate capacity planning routine, Snell set out to prioritize other ways Datadog might help improve operations and productivity. And as is typical in operations, priorities are often dictated by circumstances beyond one’s control. Such was the case when Snell noticed the firewall suddenly experiencing a high error rate. The errors were symptomatic of traffic overload, but Snell had no idea what might have caused traffic patterns to change. Even more puzzling was that the spikes were occurring at what had not previously been a peak period.

Snell turned immediately to Datadog to see what he could learn. His instinct was to check the hypervisors first, where he found four that were experiencing spikes. He then looked at the virtual machines on those hypervisors, and found the culprit: PostgreSQL databases that were being backed up to an off-site facility. The developer had changed the way the databases were being backed up, and a simple tweak to stagger the backup schedule solved the problem. Both the developer and Snell were quite pleased with the results: “Without the insight provided by Datadog, I could well have spent days troubleshooting this problem only to end up adding more firewall capacity as a work-around.”

Custom Dashboards Make Datadog’s Powerful Capabilities Easy to Use

Snell is now endeavoring to instrument virtually everything in Revinate’s virtual Rackspace Private Cloud environment. In a relatively short period of time, he has become quite adept at creating custom dashboards, and has built them for each of his internal services. Although not characterized as a DevOps initiative, the dashboards are accommodating both operational and development needs, and Snell expects the effort to result in higher productivity for the staff and better performance for the applications, together yielding substantial ongoing cost savings as the company continues to grow.

“Datadog is the right tool for monitoring and managing the RPC OpenStack infrastructure,” says Snell. “I’m pleased with the product. I’m pleased with the results we’re getting. I’m pleased with the support I get whenever I need it. And I am confident that Datadog will continue to meet our needs in the face of whatever the future might hold.”

“ Datadog is the right tool for monitoring and managing the RPC OpenStack infrastructure. I’m pleased with the product. I’m pleased with the results we’re getting. I’m pleased with the support I get whenever I need it. And I am confident that Datadog will continue to meet our needs in the face of whatever the future might hold.”

Chris Snell
Manager of Technical Operations, Revinate

Resources