To keep up with their customers’ evolving needs, Zendesk’s developers need the freedom to build new features quickly. Historically, Zendesk had used a monolithic, on-premises architecture for its production workloads, while its non- production workloads ran on Amazon Web Services (AWS). This setup created a lot of friction for their developers and made it difficult to scale.
In this old setup, if a developer wanted to scale something out or create a new feature, they would have to wait for the IT team to order hardware and configure it, meaning developers couldn’t solve problems the way they wanted to and at the speed they wanted to. Transitioning production workloads to a container- based environment that ran on AWS would give developers the modularity they needed to work on new features and accelerate development.
“ Our breadth of functionality can be tied back to the velocity. Zendesk can offer more features and do more for our end-users based on how quickly our engineers can innovate.”
Principal Engineer, Zendesk
In this new environment, Zendesk also needed to take a new approach to monitoring. Empowering their diverse user base to deliver great customer service meant that Zendesk needed to maintain an always-on, high-performing platform. To meet this demand, they needed the help of a robust monitoring solution to solve issues before they impacted the customer. Previously, Zendesk had used a few solutions, including Datadog, to monitor their workloads and infrastructure.
“Having multiple solutions was difficult,” says Jon Moter, Senior Principal Engineer of Zendesk, “as it created a lot of silos between teams.” It was also difficult to stitch together the different systems and make sense of everything that was going on. Furthermore, having multiple solutions for monitoring meant increased costs. Zendesk decided that having one monitoring solution would reduce confusion between teams and overall costs.
Monolith to Microservices
When deciding which container orchestration tool to use, Zendesk weighed a number of options but decided that Kubernetes was best for their environment. One of the reasons for this decision was that Kubernetes would work well in its new cloud-based production environment on AWS.
By leveraging AWS and Kubernetes, Zendesk was able to reduce the friction that developers originally faced. If a developer had a new idea, they could simply increase the size of their Kubernetes cluster and begin experimenting. This allowed Zendesk engineers to build out new features faster.
As Zendesk shifted its production workloads to AWS, they also shifted to using Datadog as their sole monitoring solution. By switching to Datadog, Zendesk was able to view their environment as a whole, instead of piecing together information from multiple monitoring solutions. While Kubernetes gives developers great speed and agility, it also adds complexity when it comes to monitoring what is running, and where, in a highly dynamic environment. Datadog easily integrates with Kubernetes clusters and gives teams the visibility they need.
Understanding what was going on in their new microservices and production workloads was critical–especially during the migration process. Zendesk was able to install the Datadog Agent throughout their Kubernetes clusters and have it automatically pick up metrics and information about the containers. The IT team was also able to tag these clusters by location, component, and other granular labels to help standardize and keep track of the clusters. The tagging and metrics made it easier for Zendesk’s dev teams to understand what was going on in their environment. For example, if there was a spike in resource usage, Datadog enabled Zendesk to almost instantly pinpoint the cause.
By moving to Kubernetes, Zendesk has been able to offer more features for their customers. “[With Kubernetes] we have transformed ourselves from a customer service ticket platform to a real-time communication tool that can mediate the end-to-end experience between our customers and their customers,” says Moter.
With Kubernetes, developers gained greater modularity and can focus more of their efforts on building applications and solving ticketing issues. Currently, Zendesk is running 25 Kubernetes clusters split between data centers and 6 to 7 AWS Regions. With this growth, Datadog has been able to support Zendesk for their monitoring needs.
Datadog has helped Zendesk deliver the always-on, high-performing platform their customers demand. Especially with their Application Performance Monitoring (APM) tool, customers can continue to have a great experience on the Zendesk platform. Datadog’s APM makes it easy to make a correlation between poor application performance and underlying issues revealed in logs and infrastructure metrics. Within APM, the Trace Search and Analytics feature allows Zendesk developers to isolate a single trace (or traces) that match a specific customer, user, error code, endpoint, service, or custom tag. As a result, Zendesk can investigate and respond to performance issues before they impact customers. Furthermore, Moter noted, “having all of our monitors and dashboards in one place makes it way easier to train people and have them look for things and know where to go.” Datadog is easy to use, can usually be installed with a single command and has over 600 integrations.
“The interactive UI was huge for us to monitor health, because it can send us application specific metrics for each container” says Moter. This feature allows Zendesk’s team to collect and drill down on more data, and continuously monitor their clusters. Datadog’s unified platform also makes it easier for Zendesk’s teams to leverage existing dashboards and alerts instead of creating their own every time. This coupled with the reliability that AWS and Datadog provide means Zendesk engineers can spend more time focusing on their customers.