Two kinds of system risk

When you set up a system, it helps to keep in mind what will happen if it doesn’t work. Depending on the costs of ‘not working’, you can build more resilience into the system.

In most cases, ‘not working’ isn’t catastrophic. If your toaster doesn’t work, it’s not that big a deal. You can make toast in a few days and live with limp bread in the meantime. On the other hand, if you’re on a mission to Mars, you’ll probably be glad you packed a few extra oxygen tanks, even if the cost of bringing them is quite high.

We make two mistakes when we organize a system:

  1. We get overly optimistic about the reliability of the system, and combine that with a narrative that minimizes the cost of living without it. I’d put the current state of our internet infrastructure in that camp.
  2. We get overly pessimistic about the likelihood and cost of failure. This leads us to over-engineer things, or to pay far more for redundancy than we should. Putting life jackets on airplanes is a great example of this. So is the avoidance of the last typo. It’s also one reason our medical costs are so high… the last .01% is the most expensive part.

A useful skill in executive decision making is the ability to describe resiliency and the cost of failure in non-emotional ways. Especially when it’s difficult do precisely that.