This four part tutorial describes the failures that can happen when humans interact with complex systems. Complete with case studies, this gives valuable lessons for developers, project managers and DevOps engineers. Learn from the mistakes of others as you won’t live long enough to make them all yourself and this talk gives you that opportunity!
Part 1: Complexity, Coupling and Systems Failures A gentle introduction to “modern accident theory”. We examine the essential characteristics of complex systems and the operators who control them. A life hack is here along with two case studies of software disasters.
Part 2/4: A Concise History of Civil Aviation Civil aviation has moved from very risky to extraordinarily safe. How this was done has valuable lessons for other industries. We looks at the challenges faced by civil aviation, how they were overcome and what we could learn from this.
Part 3/4: Blame and the Fallacy of Root Cause Analysis So disaster has finally happened, now, how do you go about preventing futures disasters? The obvious ways are wrong. So how do you investigate failure and how do you apply those lessons?
Part 4/4: Skill, Luck and Sheer Professionalism Sometimes the human element, the operators, are not hapless perpetrators complicit in the disaster but actively prevent catastrophe. This talk is full of case studies where this happened. An analysis of these case studies will help you to improve your own resilience.