This episode discusses the Therac-25 accidents, and includes an interview with software safety researcher Richard Hawkins.
Despite the widespread use of software in critical applications such as aircraft, rail systems, automobiles, weapons and medical devices, it is actually very rare to find examples where fatalities can be directly linked to a software error. Many of the examples we cite when talking about software safety are not actually accidents in the strict sense of the word. They involve extensive property damage, but no unintended harm to humans.
Therac-25 stands out as a clear-cut case of a software bug leading directly to death. Like all accidents, the causes are not simple. As we talk about Therac-25 we will discuss problems with hazard analysis, hardware design, human performance, through-life safety management, and incident reporting. All of these are enablers – systematic faults in a system that allowed a simple software bug known as a race condition to shorten the lives of five people.
The safety of medical devices highlights a fundamental conflict between the way different types of evidence are used in different fields of human endeavour.
The field of evidence-based medicine places heavy emphasis on data from randomized controlled trials, aggregated through systematic reviews which compare the data from multiple large and well designed studies. This approach is not perfect, particularly when not all data from all trials is available, but it generally works very well for drugs, where randomizing and controlling are straightforward. Continued monitoring is also statistical, collecting large group data on efficacy and side effects.
The field of product safety engineering places heavy emphasis on data from the processes used to produce a product, and the test and analysis of that product. This approach is not perfect either, particularly when human interaction is a key variable in the safety of the product, but it generally works very well for physical devices, where test and analysis are straightforward. Continued monitoring is somewhat statistical, but also incorporates detailed investigation and analysis of single incidents and anomalies.
The field of safety management places heavy emphasis on accumulated experience and understanding of the way organisations work, and the way they become dysfunctional. It draws on methods such as case studies and action research from the social sciences. It works well for situations where problems and solutions cannot be differentiated from the environment in which they take place, but lacks authority on strictly empirical questions, particularly where numbers are involved.
Medical devices introduce an engineered product into a hospital management system, for the purpose of treating patients. Arguments about the right type of safety assurance are inevitable. We don’t really know the right answer, but there is a wrong answer, which is to ignore one of the three fields altogether.
Therac-25, produced by a company called AECL, was a medical linear accelerator. Linear accelerators are one way of providing radiation therapy for cancer. Electrons are accelerated to produce a high-energy beam which burns away tumors, leaving healthy tissue untouched.
The machine had three operating modes, depending on which accessory was placed in front of the electron beam. Field light mode, with no accessory, was used to line up the machine and the patient. Electron mode used magnets to spread a raw electron beam to the right therapeutic concentration. X-ray mode used a metal target to convert electrons into x-rays, and a flattening filter to spread the x-rays.
The existence of three operating modes created an inherent hazard. X-ray mode required a much stronger electron beam than electron mode, and field light mode required no beam at all. If the wrong accessory was in place, the patient would be zapped by a beam that was much too powerful.
The logical solution to this hazard is to put in place hardware interlocks which physically limit the amount of electron beam power based on the position of the accessories. For example, the highest power beam should be available only if the x-ray target and flattening filter are locked in position. This is indeed the way Therac-25s predecessors worked. Therac-6 and Therac-20 both used hardware interlocks. They were both computer controlled, but the automation was added to a physically safe hardware design.
Therac-25, on the other hand, was designed from the bottom-up to be computer controlled. Safe operation required correct operation of the software. Unfortunately, the software was not safe. Two different but related software bugs, known as race conditions, were involved in six separate overdose accidents.