The Service Map for APM is here!

Case Study: Neto

E-Commerce Platform Increases Resilience at Scale with Datadog and AWS

Neto is a complete retail management solution that allows retailers to run their web store, point of sale, inventory, and fulfilment operations through a central platform. With Neto, retailers can future-proof their businesses in an increasingly competitive industry by delivering exceptional customer experiences via any channel—be it in-store, online, or through a marketplace.

Lack of Elasticity and Resilience in Legacy Environment Hampered Growth

Neto’s customers rely heavily on the health of Neto’s infrastructure, which must scale often to support customers’ web stores and business management tools. Failure to meet the capacity needs of customers could result in degraded service and render retailers unable to capture sales or properly manage inventory during their most lucrative, high-traffic times. Yet prior to moving to the Amazon public cloud (AWS), maintaining and scaling Neto’s legacy infrastructure—a fleet of virtual machines on a platform with limited capacity for automation—was slow, reactive, and prone to technical difficulties. Neto’s infrastructure environments often drifted out of sync, making it hard to increase capacity or deploy changes to production without engaging in manual, time-consuming processes. “We could spend up to a week preparing for a high-traffic event,” such as a customer’s television appearance or a holiday sale, says Justin Hennessy, VP of Engineering at Neto—and those were the events that they knew about ahead of time.

“Doing anything in the old environment was very time-consuming.”

Cloud Infrastructure Allows for Automatic Scaling and Provisioning

To confidently support their customers’ growth and allow for agile innovation internally, Neto needed to move to the cloud and introduce automation throughout their platform. Neto selected AWS because of its extensive and robust APIs, “making automation on pretty much every front possible,” explains Hennessy. “With Amazon, it’s a change of a number and a few minutes later, the environment’s pre-scaled for a particular event.” To bolster the efficiency and resiliency of their cloud environment, Neto adopted infrastructure-as-code practices that allow them to automatically provision, configure, and scale their infrastructure through APIs.

“One of the driving forces was improving the resilience of the platform.”

Poor Visibility Threatens Digital Transformation

In order to migrate confidently and truly thrive in the cloud, Neto would need end-to-end visibility into their infrastructure before, during, and after their move to AWS. But as Neto prepared to migrate their applications and customer assets to the cloud, they found that their existing open source monitoring tools were unable to provide platform-wide visibility across a highly automated cloud environment. Neto’s legacy monitoring setup consisted of manually configured health checks for individual host machines, meaning that their monitoring coverage would not scale dynamically with their cloud environment, nor track services across ephemeral infrastructure components. Neto’s engineering team needed reliable, real-time insights into the state of their legacy infrastructure and their new AWS environment in order to track the progress of their migration and ensure success on the cloud.

“When you move to a highly dynamic environment, you want to move away from monitoring individual servers, and towards monitoring groups of services.”

Migrating from Legacy Environment with Monitoring that Scales in the Cloud

Neto enlisted Datadog to ensure that their application and assets were transferred with minimal customer impact, and that Neto’s newly automated platform remained reliable and performant once in production on the cloud. Datadog’s ability to collect metrics from both of Neto’s environments and then display the health of every host and service in a single interface—regardless of where they were running—meant that Neto never experienced a lapse in visibility or platform reliability during their migration. In Neto’s new cloud infrastructure, Datadog helps support Neto’s overall automation efforts by monitoring new hosts as soon as they come online, allowing Neto to track the health and performance of any service, as it scales, at a glance.

“At the end of the day, Datadog is our central portal to the platform. It’s the first place we go.”

Maintaining the Customer Experience in all Phases of Migration

During Neto’s 18-month migration project, the visibility provided by Datadog was critical to maintaining platform reliability and ensuring business as usual for Neto’s customers. For six of those months, Neto’s legacy and cloud infrastructures were running simultaneously as customer assets were transferred from MySQL to hosted Amazon Aurora databases. Datadog helped ensure the accurate, on-time migration of these customer assets by collecting, aggregating, and displaying metrics from databases in both environments on a single platform. This made it easy for Neto to visually correlate metrics and troubleshoot across environments, reducing mean time to detection (MTTD) and allowing Neto to resolve issues before they were felt by customers.

By monitoring traffic, latency, and resource usage as workloads moved to the cloud, Neto was able to track performance in real time and make any needed adjustments to ensure that their re-architected application would function properly in the new environment. For instance, Neto kept a close watch on database performance using Datadog’s built-in integrations with AWS services as well as with the underlying database engine itself. “We’re using the Amazon integration and native MySQL metrics to build a comprehensive Aurora dashboard that allows us to look at all of our clusters together,” Hennessy says. “It’s pretty obvious when a cluster is misbehaving, and then we can drill down into that cluster in isolation and address wherever the issue or congestion is.”

Mastering Automation for Improved Resilience on the Cloud

Now on the cloud, Datadog increases the efficiency and reliability of Neto’s platform by ensuring that Neto’s infrastructure and monitoring coverage to seamlessly scale in parallel. “We build our infrastructure off a single golden image, so we just baked Datadog in and then it was pushed out to all of our environments,” Hennessy says. Neto deploys the Datadog Agent through Terraform, which they use to automatically provision and configure their dynamic infrastructure. By reducing the manual overhead of scaling and monitoring their environment, Neto has enabled product features and fixes to move nimbly between development phases and taken key components of their platform “from adequate to highly available and resilient,” Hennessy says.

With Datadog and AWS, Neto was able to restructure their platform for optimized performance and wide-scale automation, significantly improving platform resilience and priming them for their next stage of growth.

“Now we have a new level of resilience. And on top of that, we now have platform-wide visibility, which we didn’t have before.”