Building blocks: Introducing observability through monitoring
Observability is the property of being able to see how well system components are functioning internally by measuring their outputs. Establishing a culture of observability goes beyond simply taking measurements; it means developing a shared understanding of how metrics and monitoring can help each team make informed, data-driven decisions to maintain, improve, or supplant existing systems. The initial step to building this culture is ensuring that you have the right systems in place to obtain the right data, and that those systems are accessible and useful to people in your organization.
Having experienced it himself, Cory Watson outlined how he introduced Datadog to the existing suite of monitoring tools at Stripe in an effort to grow their culture of observability. The roadmap he lays out for the successful introduction and adoption of a monitoring platform is simple but powerful: understand each team’s needs, start small, communicate, and hold yourself accountable.
Understand each team’s needs
When Cory first started at Stripe, increasing the level of insight into Stripe’s infrastructure and operations became one of his top priorities. Before suggesting any changes, Cory took the time to understand how Stripe had approached monitoring prior to his arrival. Understanding the status quo is important because it allows you to get a feel for what may need to change, what can be improved within existing tools, and where these tools simply fall short.
Once you’ve familiarized yourself with the tools and processes in use, seek out the people or teams that use those tools every day and ask them what they like and what they would like to see improve. A new monitoring system’s ability to address the shortcomings of the legacy stack, without regressing on functionality, is where its value lies. To demonstrate the value of adding a new layer to your observability stack, you need to be able to address these two questions:
- What is it that you are improving?
- How can you measure its success?
Working with a small team of users at the outset exposes the points of friction introduced by a new monitoring approach. Addressing these concerns at a smaller scale is more manageable and allows you to test a system’s limits without the fear of impacting day-to-day operations.
Change is hard. Changing processes and culture is harder. Communicating with people throughout the organization as you introduce a new monitoring platform is the best way to ensure that your work meets the needs of each user and is received warmly. Keeping teams up to date as changes are rolled out makes the process much less jarring and shows respect for your coworkers and their work.
Find power users within your organization to champion your approach by introducing them to the work you are doing. Power users shouldn’t be limited to people on your team, but they should be interested in the solution you are introducing and excited by the potential benefits. Providing access to these power users creates an invested group of people who want to help improve your platform and see its adoption through. Cory suggests you be proactive and ask power users questions like, “What can I do or give you to make this easier?”
By taking their input into consideration, you improve the usefulness of your systems while your champions help build confidence in your team’s work.
Be accountable and make adjustments
Don’t be afraid to stop and periodically ask yourself if you are using the best approach for the problems you are trying to address. As a part of rethinking your strategy over time, it’s important to listen to people who aren’t totally onboard with new systems or new ideas. As Cory mentions in his talk, these are the people that “provide you with what you need to address as your projects expand to the whole organization.” Being open to the concerns both from champions and skeptics gives you a deeper understanding of what works and where you need to make adjustments.
Cory’s observability team focuses on making sure they understand and accommodate the needs of others, to ensure that their work is as beneficial as possible to their users within the organization. “Every single day when we’re working with the other engineers at Stripe,” he says, “we have to really, really care about making them better, making them quicker and making them more effective at their jobs.”
Continue to grow a culture of observability
By placing an emphasis on how their work can directly benefit other engineers, Cory’s team made it that much easier for their new monitoring platform to be integrated as an important part of Stripe’s observability stack. The increased insight into Stripe’s infrastructure allows teams to make informed decisions to improve existing systems.
Following the roadmap outlined above, you can not only improve your monitoring, but also promote an understanding of how data-driven decision making can benefit your organization and the individual teams within it.