Challenges and Opportunities of Incorporating Machine Learning into Rail Safety Analysis

Featured story
Machine Learning techniques offer great opportunities to improve how safety is proactively managed in the rail industry.

For instance, some of the machine learning applications in the areas of engineering and operations, which future articles in this series will look at, can bring safety benefits on top of making the industry more efficient and resilient. A good example is the automatic identification of track faults such as broken or buckled rails. This has clear safety benefits, as these faults are key hazards related to train accidents.

In this article I want to focus on how the industry is developing capability to get more intelligence out of its safety data, which is core to what RSSB does.

The industry collects an enormous amount of data on safety-related events each year.

The industry’s Safety Management Intelligence System (SMIS) collects data on incidents and accidents on the mainline railway, including those that result in injury. SMIS, which has recently been upgraded to a new software platform, has been in place since the late 1990s, and each year more than 80,000 new records are added.

Complementing SMIS, the industry’s Close Call system collects data on unsafe actions or conditions that could have resulted in an incident or accident, but on those occasions did not. It has been in place since 2011 and now collects around 300,000 reports each year.

That is a great deal of information, which would be a struggle for humans to analyse without any sort of automation. However, conventional data processing systems would also find it difficult, particularly in the case of Close Call system, which is extremely varied in nature, and has a mostly text-based descriptive content.

For this reason, RSSB is exploring text mining and natural language processing (NLP) techniques. NLP offers a way for computers to analyse and understand from human language in a fast, smart and useful way.

Accident triangle

NLP developers create algorithms that identify the different parts of language and how it is organised, so that the logic and meaning can be extracted.

If this can be done successfully on Close Call data, then we will greatly increase our understanding of the types of events that lie at the lower levels of the ‘accident pyramid’ – those events that are the precursors of more serious incidents and accidents that occur higher up the pyramid.

And if we can do that, then we can be more proactive about managing safety.

RSSB has launched an NLP-based research project, which is being done in conjunction with the University of Huddersfield, and which focuses on developing an interactive approach based on machine learning that will greatly reduce the need for manual review of data.

The concept of NLP might be easy, but in practice it is a challenging problem to solve. Due to the specific challenges in gathering insights from the Close Call system, an ad hoc NLP approach has to be followed, based on a graph database, a railway ontology, and semantic chunking.

Graph database

The Close Call system contains data that is likely to be highly connected, with millions of relationships. Our text-mining research is therefore using a graph database approach to identify and record the information in the system, building an interactive structure formed of nodes and relationships to describe the information embedded in the Close Call text.

Railway ontology

Railway ontology

The gathering of insights from Close Call has to cope with incorrect spellings, varying quality of grammar, and use of slang terms, as well as the many words that have double meanings or different uses, such as pass, nail, or board. For this reason, we have to develop an appropriate railway ontology, to ensure that the system ’understands‘ the specific safety issue raised in any given report. A real-life example, which can help illustrate this, concerns the notion of ’vegetation‘. Unchecked vegetation has the potential to obscure signals for train drivers or to create a tripping hazard for those working around the trackside. The Close Call system contains a number of reports related to these types of risk. To extract that information (automatically), you need a means of identifying when and how individual words relate to the entity of ‘vegetation’. That is where the ontology comes in: it describes the set of terms that have a relationship to vegetation, as well as describing what the relationship is. In building the vegetation ontology, more than 50 different spellings of the plant buddleia have so far been identified, illustrating the challenge of coping with this aspect of text-mining.

Semantic chunking

In order to derive insight from Close Call data, we need to go beyond the ontology: we want the system to identify and link meaning between records. This is where ’semantic chunking‘ and word pattern analysis comes in. For instance, let us consider the case of activities related to possessions and line blockages. These are necessary activities for the maintenance and continued operation of the railway, but they are not without risk. Combining a possession-related ontology with a semantic chunking layer allows different sentence and phrase structures to be understood as equivalent.

In this way we aim to build a system that sees, for example, the three sentences; “Operative did not have required eye-wear for task”, “No goggles on person when signing in to site” and “Staff missing protective glasses necessary for work” as essentially equivalent. They all say that a member of the workforce did not have the protective eyewear required for their intended activities.

Using Machine Learning NLP techniques in this way, we will be able to group like events together at any level as required by users for their risk management activities. This will help identify trends and hazards before they become incidents.

Semantic chunking

In the longer-term, the desire is to use machine learning techniques across the range of safety-related sources, to build up a dynamic picture of how risk is changing, giving more information and power to safety managers. The use of ‘bowties’ in risk management is now quite commonplace. They are static representations of how unwanted events could occur, and what could happen if they do occur. They get their name – bowtie – from the shape that is formed if all the potential paths toward the unwanted event, and potential consequences following it, are laid out graphically.

The bowtie also captures the barriers (control measures and mitigation measures) that are in place to reduce the likelihood of the event happening and reduce the magnitude of the harm that could arise if it did.

Machine learning techniques open up the possibility of ‘dynamic’ bowties, where changes in the nature of threats or effectiveness of control measures that could result in the unwanted event are automatically monitored, identified and flagged to those managing the risk. RSSB is carrying out another research project, also with University of Huddersfield, to explore the feasibility of this approach.

In the future, mature dynamic bowties will use not just the safety information referred to in this article, but will also use the growing set of data coming from engineering monitoring.

And this is only the start of the journey when it comes to taking a big data approach powered by Machine Learning techniques to deliver safety benefits.


In the next article, Sharon Odetunde – Head of R&D Partnerships – and Paul Gray – Professional Lead Engineering R&D - will reflect on the growing role of machine learning in inspection and maintenance tasks.

Haven’t found what you’re looking for?
Get in touch with our expert for more information
Liz Davies
Tel: 020 3142 5475
Cookies help us improve your website experience.
By using our website, you agree to our use of cookies.