Chris MacNeel

By Chris MacNeel

Using Good Data Practices for Improving Fault Isolation Procedures in Aviation Maintenance


Building a Better Fault Data Ecosystem

Fault Isolation is a paramount process in aviation maintenance and is difficult to accomplish for a few reasons. First, the manual for the fault data ecosystem is typically designed once at the beginning of a program and using engineering precedent, hypothesizes how things will fail. In reality, however, things are often unpredictable and fail differently.

Faults can register in weird combinations, while a majority of fault codes are explicit and a 1-for-1 match, like a notice that a tire needs changing (if you see this fault do task x), others are not clear and may require interpretation and troubleshooting to determine the most likely cause. This depends on if there were any other particular events that preceded it (co-occurs) on the aircraft. This is also the part where things get hairy.

After a complex fault is recognized, most of the time a maintainer will go through a defined troubleshooting process, a series of tests, to figure out how to best isolate the issue. These troubleshooting processes carry significant ambiguity due to the nature of the aircraft design. Many system design documents explaining how things should fail are created at the beginning of a program and never integrated with real fleet data. In some cases, the maintainer is highly educated and experienced, in others, the maintainer hasn’t been given all the necessary training so the selection of faults to troubleshoot may be inefficient and misinformed.

Given this, how do we implement data practices to mitigate the challenges faced in the fault isolation ecosystem?


Aircraft Faults & Fault Isolation: Their Impact on Aviation Maintenance

Defining Aircraft Faults & Fault Isolation

Faults are specific messages, recorded by an onboard computer, that occur when something has failed or is suspected of failure, and can be a caution, warning, or breakage. Fault isolation is the process of diagnosing and troubleshooting faults. This process requires manuals and experiential knowledge to identify which procedure resolves the fault and provides a solution to the problem.

If we can create a cleaner ecosystem of data around the fault isolation process, maintainers can become more efficient, resulting in tasks being completed earlier with more uptime for the aircraft.

Better, Cleaner, Data Leads to Better Performance

Good data practices must be prioritized before analytics. Once you have a decent existing mode, tweaking certain hyperparameters, reworking, training, and optimizing the model have less performance ROI compared to creating a better, cleaner data set that has less noise.

Studies show better performance comes from better data and not as much from model improvements once a decent performing model is selected. If you want decent analysis, you need good data.

3 Steps to Set yourself Up for Analytical Success

What are the steps we can take to build a better fault analysis data ecosystem?

1. Merging & collocating disparate datasets

It is common for the fault data, the work order system, and the troubleshooting manuals to run in completely different locations with limited to no data cross-synching. This is a problem because typically these systems require different access rules and requests. It’s difficult for analysts and data scientists to find the datasets required and also gain access. It also creates a problem because users tend to get single access. Once a user gains access to both systems it would only benefit the user and not everyone else who would like to use that data. Organizations should prioritize co-locating these disparate data sets. How you may ask?

Ideally, there would be one system that houses fault data, work orders, and troubleshooting manuals all under one roof. But in reality, the maintenance system is usually different than the system that translates and presents the fault data. Most of the time the same system that translates the fault data also provides the troubleshooting manual but that’s not a given either.

Creating a single database that extracts data from all disparate repositories and utilizes ETLs to efficiently and effectively update the data, alleviates the issue of Data Scientists becoming pseudo system architects, spending their time trying to find and gain access to source data and use a 3rd party tool to merge them into 1 source. With the proper infrastructure in place, data can be loaded, cleaned, and analyzed all in 1 place.

2. Improve Traceability

In addition to intelligently centralizing storing, logging, and managing the fault data ecosystem, data owners should be associating the data intelligently.

Most systems lack full automation which can cause variability in fault prioritization, order, and scope because of the influence of human interpretation and bias. Because the fault systems and work order systems are different, rarely are associations created that state the work order created was due to a specific fault. In terms of analysis, this requires the data scientist to have a deep understanding of the inner workings of a platform, the data systems, and the engineering of the faults. These skillsets are not very prevalent in the job market, so creating something that associates the work orders with the faults can alleviate a lot of expectations for inherited subject matter expertise.

Boosting Traceability by Utilizing Subject Matter Expertise

Creating associations between work orders and faults is a complex and ambiguous process. Not everything will result in a one-for-one association but creating something can drastically improve analysis. By having Software Developers, Data Scientists, and Platform Subject Matter Experts working together, a sophisticated and efficient rules engine could be constructed to associate work orders and fault datasets. Once this dataset is constructed, the analysis is much easier and can lead to the creation of key features.


3. Create Data Feedback Process & automated features

Because fault isolation is so complex, a work order may not fix the problem the first time resulting in faults reoccurring (sometimes referred to as a “repeat recur”). Sometimes faults can recur over several flights with different work orders being created until a final solution is identified.

With data in one central location and association scores developed, a feedback process can be created which would find all of these “repeat recurs” and identify the final work order that actually resolved the issue. Similar processes could be created in order to automatically create features that would give analysts and data scientists more impactful data to work with.

Being armed with intelligent features and extra clarity analysis will provide a significant return on investment and continue to pay dividends as the fleet expands and the data volume increases.


3 Data-Driven Steps for Success in the Fault Isolation Ecosystem

After you have merged the datasets into one repository, associated records with SME logic, and created data feedback loops and automated features, your analysis time will be reduced, the results of your analysis will be more accurate, and large inefficiencies should be easily spotted leading to increased up time for your fleet.