Defining mistakes, failures & disasters
Complex safety features may contribute to dire consequences
In Megan McArdle’s new book, “The UP Side of DOWN — Why Failing Well Is the Key to Success,” she focuses on failures and how individuals react to failures that occur to them and to others.1 She presents the differences as she sees them among accidents, mistakes, failures and disasters.2 Here are safety analogies:
Accidents are coincidences that could not have been plausibly expected, or planned for. A failure is not an accident. More often than not your accident is the result of someone else’s failure.
“A mistake,” according to McArdle,” is the opposite of an accident: it is where you could and maybe should have done something differently, but nothing really bad happens as a result.” How many safety mistakes are made during the day wherein nothing really bad happens — dozens, hundreds, or thousands? Is a safety mistake a “near-miss?”
A “failure is a mistake performing without a safety net,” says McArdle. When the fail-safe systems suddenly fail, oftentimes something horribly wrong occurs. “If someone had only done something differently — better — it could have been prevented.”
A disaster is a cascade of failures. Many result from a series of failures that have happened over a period of time — short in duration or the result of numerous failures over a longer period.
McArdle draws upon Charles Perrow’s work in his book “Normal Accidents,” about the partial meltdown at Three Mile Island back in 1984.3 Perrow notes that “Accidents aren’t aberrations that can be avoided simply by designing more safety features into the systems; indeed, since safety features themselves add complexity, they may raise, rather than lower, the chance that something will go wrong.”4
A disastrous example
Perrow uses a thought experiment to make his point.5 I will use a safety analogy. You are the Operations Manager faced with declining revenues, a very tight budget, significant maintenance issues that are not being funded or addressed and deteriorating operational infrastructure. Injury reporting is going underground. Senior management empathizes with your plight, but offers little financial support.
You focus on ways to accelerate your employees’ time on the tools by cutting break times, eliminating vacation leave, limiting safety toolbox meetings to twice per week, etc. Productivity improves after several months but morale is declining and employees are talking up a union campaign.
After a hurricane, you and your maintenance supervisor notice the large concrete secondary containment unit that has been leaking for years has collapsed, releasing its contents to the nearby waterway. Recalling an operator technician was dispatched to check equipment in this area just after the storm hit landfall, you see if the operator has been accounted for. After contacting the operator’s wife, it is apparent he is missing. The operator’s body is found crushed under the collapsed separator.
Perrow uses a quiz to help answer the question, “What caused this disaster?”
1. Human error (management’s failure to fund maintenance activities, operations manager under pressure making imprudent decisions)? Yes __ No __ Unsure __
2. Mechanical failure (collapse of leaking concrete secondary containment device)?
Yes __ No __ Unsure __
3. The environment (Hurricane Gaston causing havoc)? Yes __ No __ Unsure __
4. Design of the system (flawed construction design of the secondary containment device)?
Yes __ No __ Unsure __
5. Procedures used (cutting toolbox meetings, dispatching operators to field during a hurricane, failing to maintain head counts)? Yes __ No __ Unsure __
Interaction in unpredictable ways
How would you answer these questions?
McArdle notes that Perrow points out errors do not lie in some specific decision, person, or procedure. Perrow explains that “failure is inherent in complex systems” and what he refers to as “tightly coupled” systems. From a systems thinking perspective, failures in complex systems involve the “interaction” of mistakes in unpredictable ways that can lead to unintended consequences (e.g., disasters). Perrow notes that these mistakes “interact” in a non-linear fashion. Determining the causes becomes all the more difficult and even harder to correct.
McArdle presents the crisis of “throwing good money after bad” using the federal government’s investment in Solyndra, a photovoltaic cell manufacturer with a new idea for less expensive manufacturing. Solyndra ultimately filed for bankruptcy burning through almost a billion dollars to include $535 million in U.S. loan guarantees, which the American taxpayer will never see repaid. Even though everyone knew, including Solyndra’s auditor, that the company could not make a competitive product, more money was thrown at Solyndra.6 McArdle further explains the reason the government kept throwing good money after bad is what economists call “sunk costs.” People cannot let go of money already invested, because they have such a passion for their investment to pay off. With “sunk costs” — labor, raw materials, etc. — economists will advise the right way to deal with them is to ignore them.7
Phil La Duke in his piece, “Safety Professionals: Focused on the Wrong Things?” focuses on three big obsessions of safety professionals: preventing injuries, behaviors, and finding root causes.8 How much “sunk costs” do you think the public and private sectors have invested in these three obsessions? Considering full costs, I estimate far in excess of a billion dollars — and what quantified empirical improvements have actually been made in safety?
As safety professionals we need to stop doubling down on our “sunk costs.” We cannot engineer failure out of the system. We need to adopt Megan McArdle’s premise that “Failing well means learning to identify mistakes early by understanding those mistakes so they can be corrected. Most of all, it means overcoming the natural instinct to blame someone whenever something goes wrong.”9
1 McArdle, M. 2014. The UP Side of DOWN — Why Failing Well is the Key to Success. Viking Penguin. NY, NY.
2 Ibid. pp. 79-82.
3 Ibid. pp. 88.
4 Perrow, C. 1999. Normal Accidents: Living with high Risk Technologies. Princeton University Press. Kindle Edition.
5 Op cit. pp. 88-89.
6 Ibid. pp. 116-120.
7 Ibid. pp. 120-122.
8 La Duke, P. 28 July 2014. Safety Professionals: Focused on the Wrong Things? In EHS Journal online at http://ehsjournal.org/http:/ehsjournal.org/phil-la-duke/safety-professionals-focused-on-the-wrong-things/2014/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+EhsJournal+%28EHS+Journal%29
9 McArdle, M. Spring 2014. Why Failing Well Is the Key to Success. In CATO’s Letter. 12:2, 1-5.