Menu

When Software Goes Wrong - A Quick Recovery

Featured story
This podcast looks at a complete shutdown of a working air traffic control system, and how it was quickly brought back into service without incident.

In this sixth podcast about software failures in safety-critical systems, I talk with Dr Emma Taylor about an incident in 2014 that happened during normal working of the National Air Traffic System. We look at what went wrong, and how good recording and documentation at each stage in the V-model allowed a complete shutdown of the air traffic control system for southern England to be quickly reinstated—without any harm to the thousands of passengers in the air.

One of the aims of this podcast is to share learning with the wider industry to support a wide range of stakeholders including train manufacturers and owners.

Listen to the podcast

Topics in this episode include:

  • Emma describes the incident and its impact on passengers. She compares air traffic control and railway control and what the railway can learn from this incident [2:05]
  • Emma sets the scene for the railway as it introduces more and more digital parts.  She describes some differences between the city metro incident in the previous episode and the NATS incident [4:20]
  • Emma talks about the system definition step in the V-model, and her opinion of some assumptions made about the dependability of the core software [4:43]
  • Emma explains why the latent software fault wasn't found in verification; the failure that happened, and the categorisation of safety hazards [7:15]
  • Emma talks about a recommendation from the incident inquiry and how good documentation and work logs were able to narrow down the search for the faulty line of code [9:20]
  • Emma talks about the importance of specifying the ability of a complex software-based system to log changes to software and faults as they arise. [10:51]
  • Emma talks about the recommendations from the NATS report that will help find the 'needle in the haystack' – those that will help the rail industry avoid similar problems in future, including the need for a 'complete, continuing evidence base' [11:39]
  • Emma talks about the supply chain and the need to manage software quality in the supply chain [14:04]
  • Emma emphasises the need only to ask of suppliers what they are able to deliver—what processes can be mandated, including the potential economic benefits of using newer testing technologies [15:12]
  • Emma talks about the practicalities of retaining development information, the options for auditing the evidence and the verification processes, and the introduction of a formal error management system [16:44]
  • Close [18:16]

Resources mentioned in this episode:

Other related resources:

RSSB podcasts cover a range of topics to keep you informed about things that will lead us all toward a better, safer railway. 'All our podcasts can be accessed from our podcast page.

Haven’t found what you’re looking for?
Get in touch with our Lead Content Manger for more information.