How Machine Learning prevents broken power lines for ProRail

In this blog, our educator in Machine Learning Engineering gives an insight into the project he has been working on at ProRail. He will tell you about the challenge, the solution and the result.


Challenge

ProRail’s main objective is to keep the Dutch railway infrastructure in optimal condition. Due to all kinds of reasons, about once a month, an overhead power line breaks which leads to:

  • Delays, the entire track (including all parallel ones) are closed for 12 to 36 hours;

  • Compromised safety of passengers and train crew due to loose high-voltage cable (1500 - 3000V)

  • Costs up to 100 000 euros per breakage

Knowing exactly where overhead line breakages will occur in the 7000 kilometres of train track ProRail manages is nearly impossible. In the current way of working, ProRail employs a “measurement train” which, once a year, checks the track on all kinds of problems, including the wear and tear of the power lines. However, once a year is not enough. Mistakes made when installing the line, malfunctioning trains or extreme weather can lead to excessive damage, resulting in lines deteriorating in a matter of weeks. It has been our challenge to figure out where this might happen, based on data.

Solution

Our goal is to identify a potential problem before it causes trouble. This is a form of predictive maintenance and typically involves machine learning in one way or another. What is our approach to this challenge?

Data is key

The most critical asset in a Machine Learning project is high-quality data. Our data comes from a dedicated measurement setup, which is installed on two passenger trains (one intercity and one sprinter). This setup consists of acceleration sensors on the trains' pantograph - the part of the train that connects to the power line - and a dome camera filming the pantograph from the train's roof. Both data streams (acceleration and video) are processed on an onboard computer system running Docker, which sends relevant subsets of that data to the Azure cloud via a 4G or 5G connection.

Finding signal in the noise

This setup generates an enormous amount of data. The acceleration sensors measure with a frequency of 17 000 Hz in each direction (longitudinal, latitudinal, and vertical) and are fitted on both ends of the pantograph to be able to measure rotation too. A quick calculation (17000 Hz x 2 sensors x 3 directions x 3600 seconds per hour) tells us that we measure 367 million data points per hour per train. In order to process data of such magnitude, we use the distributed compute framework Spark. We also downsample the signal by a factor of 10 as a frequency of 17 kHz was not required for our models. Still, this leaves us with 36 million data points per hour. It’s cool that we measure all this data, but we are only interested in a small piece of it: where it deviates from its “normal” pattern since that might indicate that there is something happening in the overhead power line. Therefore we apply anomaly detection by means of time series analysis on the acceleration signal to find where significant deviation from the norm happens in the noisy signal data.

Finding anomalies (we call them triggers) in the raw signal is step one, but this does not tell us yet where we should focus our attention, as they happen across the entire country. We need to investigate which location yields the highest risk. To do that, we make use of density-based spatial clustering on the latitude and longitude coordinates of these triggers to determine where the highest concentration of anomalies appears. We correct this for the number of times a train passes to calculate a hit rate (ratio of anomalies per passage). Based on this information we inform railway inspectors who travel to the site to make a detailed observation and judge whether action should be taken now.

The last step is to apply condition monitoring, where we track changes in triggers over time to catch changing situations. With time series analysis we track and detect drift to recognise when the acceleration signal starts to increase over time.

Reducing false positives with computer vision

Although the anomaly detection model yielded interesting results, it typically triggers on locations with many vibrations. However, this does not necessarily mean that something is wrong in the power line, as certain components or locations simply generate more vibrations than others. We are specifically interested in locations where there are anomalies in the acceleration signal that can not be explained by the layout of the power line. Therefore we apply object detection on the camera footage to identify which objects we encounter in the power lines.

In case you were wondering: ”Why don’t you identify problems in the power lines directly from the video images?” This is currently impossible due to the limited resolution of the camera system and the absence of training data on damaged power lines. However, what we can do is find potential explanations for the anomalies that we recognise. Certain components in the power lines are responsible for certain patterns in the accelerations signals. You can think of clamps, connectors and suspension components.

We are able to detect various important components with the object detection model that we developed. When we are aware of where these components are in the power lines, we can exclude harmless triggers and we do not need to send an inspector to the location.

Results

After a year of testing with the measurement setup and developing the algorithms to identify potential breakage points ProRail’s DataLab and Asset Management finalised the proof of concept phase. During validation a number of potentially critical locations were identified that got repaired right away to prevent any further degradation to the power line. The accuracy, precision and recall of the system can be tuned further to the preferences of the stakeholders working with the system. It is now up to ProRail to decide how to continue. We are excited to see how this potential solution could contribute to a safer and more reliable infrastructure for ProRail.

Acknowledgment

Innovation project by ProRail’s DataLab. Xccelerated consultants co-developed the software and machine learning models. Team: Paul van der Voort (Head of DataLab), Marcel Gerrits (Product Owner), Monique Koopsen (Scrum Master) and the development team: Maik Havinga (Xccelerated / ProRail), Tiamur Khan (Xccelerated / NS), Joost van ‘t Schip (ProRail), Oscar Enzing (ProRail) and Wesley Boelrijk (Xccelerated).

Wesley Boelrijk

Feel free to contact me or connect with me on LinkedIn if you have any questions about our projects or our training program

Xccelerated is an initiative within Xebia Group, accelerating careers of young professionals with 1 to 3 years of relevant work experience in the fields of data & AI. Our 13-month advanced training programs integrate hands-on learning and skill development with working at one of our partner companies, like Heineken, KLM, ING, Vattenfall, Randstad Group or FedEx.