What do electric cars, steel capped boots, and balloons bursting in crowded lecture theatres have in common? Not much, except that they all feature on this episode of DisasterCast. When it comes to achieving safety, one of the key questions is “How Much is Enough?” There will always come a point where the amount of risk you are facing doesn’t justify taking further measures to reduce it. Beyond this point, we can receive better return on our safety investment by spending our efforts and money elsewhere. We may even be destroying the benefits we get by trying too hard to be safe.
When we’re designing systems, certain aspects of safety can be expressed in numbers. This is particularly the case when we are concerned about random failures. Random failures are what we usually think about when we consider a car, train or aircraft breaking down or doing something unsafe. One minute a component is working, then it fails, after which it is no longer working. We can express the random side of things as a probability. We can reduce the likelihood of random failures by using better components, and we can reduce the impact of random failures by building redundancy into our systems.
Random failures aren’t the only type of failures though. We call the other sorts of failures “systematic”. Redundancy doesn’t help here, because no matter how many widgets we have, if they’ve all got the same design flaw then under the wrong conditions they’ll all fail at once.
Working out how much redundancy we need is something we can determine mathematically. Working out how much protection we need against systematic failures is more nebulous. Software is a good example of this. We never know how many errors there are in a piece of software, because any time we find an error we fix it. We can reduce the number of errors by putting a lot of effort into finding and fixing them, but this still doesn’t help us count them.
The question “How safe is safe enough?” turns into “How hard do I need to keep looking for systematic failures?”. This is where the concept of safety integrity levels comes in.
Partial transcript is available here.
Podcast: Play in new window