Ops Happens: Improving Incident Response Using DevOps and SRE Practices
Damon Edwards (Co-Founder, Rundeck, Inc.)
Location: Grand Ballroom G
Date: Thursday, May 3
Time: 9:00am - 9:45am
Pass Type: All Access, Conference
Format: Conference Session
Vault Recording: TBD
Audience Level: Intermediate
Deployment is fun to talk about, but it is mostly a solved problem. Yes, there is work to be done, but the operations community has repeatedly proven that we can scale application/infrastructure automation and distribute the capability to execute deployments.
Now, we have to turn our attention to the next critical constraint: What happens after deployment? We all know that failure is inevitable and is coming our way at any moment. How do respond quickly and effectively to those failures?
What worked when there was just a small number of teams or an isolated system to manage quickly breaks down when the organization grows in size and complexity. At the same time, traditional operations practices in large-scale enterprises have proven to be too cumbersome, too silo dependent, and too slow for today's business needs.
In this talk, we will first dissect a real example of an enterprise incident. Then we will examine the trial-and-error lessons learned by forward-thinking enterprises who are currently streamlining how they:
- Resolve incidents
- Reduce friction between teams
- Divide up operational responsibilities
- Improve the quality and cost of their ongoing operations