It’s 3AM and you have just been woken up by the nagging ringtone of your phone. The PagerDuty service is contacting you - as your name was on the on-call support list for today. This late, it can’t be good. Sure enough, despite the bleary eyes and the utter lack of direction, you see a cryptic message: “service templeton is lagging”. You had heard that this new service being rolled out earlier this week but had not had a chance to ask anyone how this was supposed to run. By then you quickly run through your options:
(a) go back to bed and hope that it magically goes away
(b) get up, find your laptop, get some coffee started, find some runbooks and hope they are up-to-date.
You go for option (b) and you spend the next few hours looking for clues on what that alert means in the first place and how to fix the issue. With some luck you may even go back to bed before your morning alarm rings.
Sound familiar? Are you tired of spending precious time figuring out why you were alerted? Do you wish your alerts were more than just a whodunit in 140 characters or less?
You’re not alone: a better design of alerts and how to turn them into useful messages has been at the center of the community’s focus this year at Monitorama. The writing was on the wall, and we decided to do something about it.
At Datadog, just like you, we sometimes get alerts in the middle of the night, and we got tired of sifting through enigmatic text, stats and process names trying to make sense of what exactly was wrong. So we sat down to redesign alerting in a way that any alert would make immediate sense to the recipient. We’ve accomplished this by packing as much useful context as possible with graphs, up-to-date runbooks and routing.
For instance each alert comes with the graph that immediately shows what the data really looks like so that you can rule out temporary blips and go back to sleep right away.
Each alert also comes with an integrated runbook. So the person who authors the alert can easily add simple diagnostic and remediation steps right in the alert. No more endless searches in the corporate wiki, no more out-of-date runbooks.
Finally, each alert can be routed precisely to the right person, group or service so you don’t get bombarded with a barrage of alerts that are not relevant to you.
Adding context to your alerts is quick, easy and available to try for free. Signing up for Datadog takes just a few minutes, and these context-adding features for your devops alerts are available immediately.