Back to all sessions
SRE Fundamentals - From Basics to Automated Remediation
About This Session
Hands-on workshop introducing key Site Reliability Engineering (SRE) practices using Datadog. Participants will gain practical experience with reliability concepts by monitoring a microservice stack, defining SLIs and SLOs, and analyzing service performance. The workshop then explores how automation and remediation workflows can be applied to reduce toil, improve reliability, and accelerate incident response.
Learning Objectives
- Review logs, APM traces, and metrics in Datadog to identify reliability issues
- Define Customer Journey (CUJ) SLIs and configure SLOs to measure service reliability
- Detect and investigate SLO breaches using burn rate analysis and service context
- Implement TOIL and MTTR reduction mechanisms via automated remediation workflows and incident
language
english
Availability
1 available
length
60 minutes