Can Machine Learning Improve Railway Operational Performance?

Featured story
“Rail operations is an area where performance increasingly depends on our ability to derive insight from complex data sets and take optimal decisions in real time. Machine Learning has the potential to streamline operational performance by increasing the level of effectiveness in decision-making and improve overall efficiency. For rail, this seems like an opportunity worth further exploration, as much as it also calls for a deep culture change”.

The effective management of information and data is vital for the railway, a tightly coupled system of systems where changes to any one part can have significant implications elsewhere. 

Rail operations is an area where performance increasingly depends on our ability to derive insight from complex data sets and take optimal decisions in real time. The good news is that the industry generates and collects daily a huge amount of information, from multiple sources and in a variety of formats; the bad news is that this data is too huge and too complex to be processed manually.

Figure 1 A useful schematic showing some of the many different influencing factors related to dwell time (Source: D. Li, W. Daamen, and R. M. P. Goverde, “Estimation of train dwell time at short stops based on track occupation event data: A study at a Dutch railway station,” J. Adv. Transp., vol. 50, no. 5, pp. 877–896, Aug. 2016 )

Machine Learning, as a new approach to rail operational performance, is therefore an interesting concept, challenging traditional ways and processes that were reliable in the past but that have now started to show their limits. We believe that, in order for rail to compete in a fast-moving transport industry, we need to focus on a handful of key challenges (and train performance in one of the them), fully explore and embrace what new emerging data analysis can do in this space.

Learning from other sectors

Industry competition is increasingly based on proficiency in big data analytics. Machine learning insights are fuelling technology advances: from driverless-cars to financial algorithmic trading; from disease identification and diagnosis to digital marketing.

Closer to the railway operational context, the aviation industry has successfully conducted Machine Learning trials for accurate taxi time prediction, which is fundamental to achieving efficient runway scheduling, maximising throughput and reducing taxi times. NASA and American Airlines have demonstrated the benefits of a data-driven analytical method called Linear Optimized Sequencing (LINOS), applied to their Spot and Runway Departure Advisor (SARDA) decision support tool to predict taxi times and provide real-time estimates. 

Similarly, researchers from the Berkely Lab have shown that Machine Learning can help make automotive transport more sustainable, from both a traffic management perspective and energy consumption. They are making use of deep reinforcement learning to train autonomous vehicles to drive in ways to simultaneously improve traffic flow and reduce energy consumption. Another project uses deep learning algorithms to analyse satellite images combined with traffic information from cell phones and data already being collected by environmental sensors to improve air quality predictions.

What is rail doing with advanced data analytics?

When we talk about operational performance for rail, being able to reduce and predict train delays is definitely one key indicator.

In 2016 Fujitsu, the Japanese ICT giant, in collaboration with California-based SRI International, , added a train delay time prediction function, using AI machine learning technology, to Jorudan's ‘Norikae Annai’, an app that provides public transportation route-planning and fare information to customers in the Kanto region. 

Fujitsu’s engine learns from previous delays, combines them with past operational data and then makes accurate delay predictions based on Machine Learning. This can then be displayed to the customer through websites and mobiles apps and has potential benefits for any service management and recovery which depends on the predictive impact of delays.

Figure 2: Summary of Fujitsu's trial system (Source:

Similarly, an Indian travel start-up, RailYatri, has created an Estimated Arrival Time prediction algorithm using Machine Learning and statistical modelling techniques to predict the arrival time of trains. The system, trained on historical data, can provide customers with realistic estimated times for the arrival of their trains. According to Kapil Raizada, Cofounder of RailYatri, the method to predict the arrival time of trains in India had not changed over decades and was typically based on a distance by speed ratio for trains with some buffer time. RailYatri’s Machine Learning algorithm takes into considerations other parameters (“ground realities”) such as increasing traffic, rush, seasonality, etc, and adapts as it learns from subsequent inputs, making the predictions better with time. It uses clustering techniques to organise historical train runs into thousands of patterns where time series data attributes are similar. Based on the characteristics of any given running trains, the system matches them to millions of patterns to make an optimised prediction in real time.

In the UK, the five projects that have been funded as a result of RSSB’s ’Data Sandbox: Improving Network Performance’ research call are using Machine Learning techniques and other data analytics tools to consider similar challenges. The feasibility studies (which are due to be completed in March 2019), have started to show good potential in their ability to predict delays, identify delay propagation patterns, forecast arrival times of services, and model station dwell times.

The projects used existing Network Rail and operators’ datasets to build their initial models, which are generally stochastic and can be run multiple times to track the impact that various changes would have on the network. For most projects, the rich data set of results can then be explored by users through powerful interactive visualisations. The tools that are being developed by the Data Sandbox projects could be used to test possible interventions against a wide range of delay scenarios, in order to provide robust contingency plans, and improve customer experience.

Future prospects for Machine Learning use in rail operations

Timetabling and train planning seem to offer the most obvious application potential for Machine Learning. As we have seen in previous articles of this blog series, Machine Learning is particularly suited for tasks that involve complex data sets with multidimensional relationships, that go beyond human analysts’ ability.

In ORR’s interim report of the independent inquiry into the timetable disruptions in May 2018, various explanatory factors surfaced, including the schedule for developing the timetable. The time scales set out in this industry process for publishing principle change dates and making any significant amendments to services are lengthy, and the complexity of the various factors that need to be considered can be vast. ORR promises to investigate in a second phase technologies that could support the efficiency of the timetabling process by both Network Rail and the train operators. 

Delay attribution seems to be another promising domain of application. We have seen delay minutes increase by 16% over the last 7 years, with a 6% in the last year alone, resulting in approximately 14.8million delay minutes. Delay minutes attribution is always a sensitive subject, and the process is controlled by the industry’s Delay Attribution Rues and Principles. During 2017/18, unexplained or un-investigated delays amounted to 1.1 million delays minutes, i.e. 7.4% of total delays. This means that one in every 14 delays remains unexplained or the cause is not sufficiently captured.

Would Machine Learning provide an unbiased approach to help support delay attribution more effectively and in less time? And is the rail industry ‘mature’ enough to accept what new approaches and techniques might tell us?


Machine Learning has the potential to streamline operational performance by increasing the level of effectiveness in decision-making and improve overall efficiency. Train delays are often caused by an ensemble of interrelated factors, which increases the difficulty of not only assessing the reaction they may have on the network but also of choosing amongst different mitigating solutions, and ultimately the impact on the customer. By introducing Machine Learning, experts can spend less time on extract insight from complex data and instead focus more on understanding what the insights mean and how they can be used for the benefit of rail customers. For rail, this seems an opportunity worthwhile exploring further, as much as it also calls for a deep culture change.

The benefits will translate to improved customer experience, as Claire Shooter, Senior Research Analyst, will show in the next blog.

Haven’t found what you’re looking for?
Get in touch with our expert for more information
Giulia Lorenzini and Justin Willett
Cookies help us improve your website experience.
By using our website, you agree to our use of cookies.