Intelligence series: Machine Learning and Rail

Machine learning (ML) is defined as the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions. Instead, the system relies on using patterns and inference instead. Machine learning can often achieve in days or hours what might take humans years to program. They may be able to find patterns within data that a human analyst could not discern.

Latest update: November 2019

Many industries continue to develop machine learning applications. The technology and communications industry has the highest adoption rate (32%) for Artificial Intelligence (AI) with a focus on machine learning. The Department for Transport (DfT) have launched several ML-powered projects in 2019. These include the Traffic Signal Advisor Service (TSAS) to optimise traffic control and reduce congestion, and prediction of post-accident road network recovery time. 

What is machine learning? 

Machine learning is a subfield of AI that allows systems to automatically learn and improve from experience without human input. Machine learning algorithms are categorised as either supervised or unsupervised. Supervised learning algorithms are used on datasets where the desired output is known. This allows the machine to adapt the way it transforms input information and determine a method to achieve the correct output. Unsupervised learning algorithms, on the other hand, are used against data that do not have historical labels, meaning the system finds patterns within the dataset itself.

What industries use machine learning?

The finance industry uses machine learning for a variety of applications including; risk management, trading,  loan/insurance underwriting, chatbots, portfolio management, trade settlement, and money-laundering prevention.  It can also predict creditworthiness using loan applications and finance reports, as well as analysing market trends and relevant news items to determine an applicant’s real-time financial status.

Machine learning is well-suited for finance due to the quantity of data available for training ML applications. For example, the New York Stock Exchange captures approximately 1 terabyte of data every day. JPMorgan launched an ML-tool in 2018 called Contract Intelligence (COIN). The software uses unsupervised learning to identify patterns in credit contract agreements using image recognition.  The firm previously spent 36,000 man-hours on this task per annum, but with this tool, it is accomplished in a fraction of the time, and with higher accuracy than human employees.

In the healthcare industry, ML applications include disease identification and diagnosis. For instance, IBM Watson Genomics is a partnership between IBM Watson Health and Quest Diagnostics, which analyses mutated genes and blood samples using ML to provide precision medicine to cancer patients.

Government agencies use machine learning to generate insights from the numerous sources of data that can be mined with ML to generate insights. For example, analysing utilities sensor data can identify ways to increase efficiency and save money. Machine Learning can also help detect fraud and minimize identity theft. Specifically, a government authority, the National Health Service Business Service Authority (NHSBSA) uses a ML-powered chatbot. This helps the NHSBSA to respond to approximately 11,000 calls they receive on a daily basis. It was able to respond to simple queries and rerouted the complicated ones to staff. This saves the NHSBSA USD $650,000 per year.

In the technology sector, identifying similarities between words in a search query, synonyms and new signals are some of the applications of machine learning for search engines. In social media, Facebook uses machine learning models to rank and personalise content to maximise a user’s content engagement, thus enhancing the user experience.

How will machine learning impact the rail industry?

Machine learning techniques can assist in addressing many operational and maintenance issues. Train delay times can be predicted throughout the network using machine learning, thus improving route-planning and train operations

Machine learning techniques can be used to power predictive analytics and predict potential failures, consequently reducing maintenance costs. Railigent® by Siemens is an effective tool to achieve comprehensive asset management and customer proximity by condition-based monitoring and data analysis. 

In addition, image recognition systems make use of machine learning and can be applied to autonomous systems and facial recognition. This could reduce operational costs thus protecting revenue through detecting fare evaders. 

Using machine learning for recommendation systems could increase revenue and enhance customer experience by streamlining the purchasing process. Tailoring web content based on their history could allow customers to quickly find a desired route.

What is the current state of R&D?

Machine learning is being used by GSK (GlaxoSmith-Kline) to make new medicines, based on insights derived from large databases that hold genetic data. The databases can reveal the locations of genetic faults which cause medical problems. This could enable a better understanding of the diseases they cause and how to treat them.

Heriot-Watt University are collaborating with RSSB to use machine learning and develop robotised mobile inspection platforms which could pick up litter from hard to reach places such as under seat on carriages. A second project will be focussing on using machine learning algorithms to develop drones for the inspection of bridges, especially the underside which has limited access. 

In March 2019, The Defence Advanced Research Projects Agency's Microsystems Technology (DAPRA) Office sought research proposals for a potential $10M program that aims to develop machine learning tools.  The SAS Institute Inc. also decided to invest $1 billion in artificial intelligence over the next three years to develop its analytics platform, educates data scientists and targets industry-specific use cases.

What uncertainties remain?

Machine learning makes inferences by identifying inherent patterns within data, rather than using specific knowledge about the field in which it is operating. Therefore, there is no explanatory model to show how its outputs came to be. This leads to a debate on correlation versus causality. This can be problematic in safety critical contexts, where theory-free computer-generated models cannot replace expert knowledge and reasoning. Furthermore, data can contain errors, or implicit racial, gender, or ideological biases. Many AI systems will continue to be trained using inaccurate information, making this an ongoing problem.

Machine learning algorithms require large datasets and high processing capability, which can be a barrier to research in this field. Without appropriately sized datasets there may be insufficient accurate data which could affect the algorithm’s ability to “learn”. An algorithm trained on a specific dataset may result in ‘overfitting’. This is when irrelevant information or randomness in the data is detected and learned as concepts, thus impacting the model’s ability to generalise and work with new datasets. Algorithms may fail to arrive at the most appropriate solution due to getting trapped in a local optimum. Alternatively, they may arrive at globally optimal solutions which are unacceptable for problems at local levels, such as transport system optimisation that removes transport links from areas to improve city-level efficiency. 

What should the rail industry do?

In other industries machine learning is being applied to any problem where data can be obtained and analysed. There are a variety of applications for ML in the rail industry. Rail needs to identify cases where ML-powered tools would be an asset. For example, deploying predictive maintenance to assist with reducing maintenance costs and optimising route-planning. 

Before collecting the data, we should know why we want it. For example, using idle times as raw data, ML could be used to develop algorithms which could help to decrease fuel consumption. Raw data needs to be pre-processed to form a set of curated data (data that can be preserved for future use). This process eliminates outliers and incomplete data, thus leaving data accurate enough to form a realistic representation of the real world. As time goes on, more data is added to the model, and it becomes increasingly well-informed.

Data required for powering ML tools may be acquired from both legacy and new systems. This requires semantic operability (the ability to understand the exchanged data between different applications). To achieve this, an archetype (a re-usable, formal definition of domain level information) can be used to provide the shared meaning of the data. The data must then be normalised to ensure it is formatted correctly. In rail, where many legacy systems exist, care must be taken to ensure this process is carried out reliably. 

To correctly set up and interpret ML tools for use in rail, the industry should invest in the training and recruitment of data scientists which would allow the understanding of findings and, most importantly, their limitations. This could help to eliminate unwanted outcomes and reduce bias.

The lack of an explanatory model is still an issue with machine learning and this needs to be treated with caution. However, this should not prevent the rail sector from using the potential applications of ML to improve rail operations. 

For more information on how machine learning may affect the rail industry, please explore our Machine Learning blog series.

Log in or register to keep reading
Register for free individual access
  • Unlock research, articles and more
  • Get updates on RSSB’s activities





Need some help?
To talk to us about accessing RSSB content or corporate membership:
Cookies help us improve your website experience.
By using our website, you agree to our use of cookies.