Flight Scoring Continued

by Andy Robinshaw on 08/10/2020

Can piloting be directly compared to a continuous-play sport like basketball? No, not really.

However, if you’d like to see me debate the idea have a read of the previous article here.

There are obvious differences, when piloting an aircraft, you will not find yourself up against another team whose prime purpose is to stop you achieving your goal. And, even though flight crew are performing their jobs for the whole flight, the parts of the flight that are by far the most useful to assess aircraft handling usually account for less than 5 minutes. This might appear similar to only judging a basketballer by their performance between tip-off and 2 minutes, and then between 45 and 48 minutes. Out of a potential 10 hour flight, assessing performance by focussing on such a limited portion of the flight might seem over-simplified, but the short reason is, this is the portion of the flight where the crew are most actively controlling the aircraft and tolerances for error are narrower.

It’s important to note that the point of flight scoring as a concept is not to replace ‘normal’ Flight Data Monitoring (FDM) but to enhance it and offer more personalized safety information. The key result will be a value that describes where a particular flight, or a particular crew, fit among the data spread for all flights on the same aircraft type. This value could be a combination of ‘scores’ for different Key Point Values (KPVs), and then be broken down into the score for each section of flight. This concept will also greatly enhance the ability to detect pre-curser trends before they reach the point of triggering events.

To see the advantages this concept would bring, first we need to review the current principles used in FDM. FDM is designed to trigger events based on exceedances, such as a landing that was over a certain normal acceleration threshold, or an approach that was flown too fast relative to the target speed (Vapp). This scenario works brilliantly for detecting exceedances on one particular flight, and indeed multiple events triggering on that flight can indicate a poorly-flown approach, but where event monitoring falters is statistical analysis.

Simply put ‘how is our operation changing?’ is a question that cannot be properly answered with event-based statistics. Your event rates may stay the same but your operation could be shifting one way or the other. Alternatively, you might see substantial event rate changes with little actual operational drift. The results can be even more misleading if you are trying to compare your operation’s event rates to another operation, unless you know that everyone you are comparing to is using the same events and thresholds. Membership of IATA’s Flight Data eXchange program will mitigate this as all operators are using the same events and thresholds, but this type of monitoring is different from the type needed to detect how your operation might be shifting in time.

These histograms show the core challenge when trying to compare event rates. If these were two different operators, but using the same 1.80g threshold, they would both see one hard landing trigger, and both have an event rate of ~18. But, are these two operations landing the same? No. The blue operator is landing significantly ‘harder’ than the brown operator is. Using sensible level 1 and level 2 thresholds, and tracking exceedances of these could highlight an example as extreme as this, but using more event levels doesn’t deal with core problem of the ‘digitalness’ of the signal; that a landing 0.01 below a threshold is dealt with as a “0”, but an exceedance of the threshold is suddenly dealt with as a “1”. This type of analysis hinders the ability to proactively detect trends and operational shift.

To keep on the previous analogy of a sports team, imagine my team plays 20 games per season, and there are no ties in this league, last season we went 10-10, and we did the same this season. Tracking results like this is similar to tracking level 3 event rates, you can’t tell whether we’ve got better, worse, or stayed exactly the same. Within each game is extra dimensionality that can’t be accounted for by simply counting it as a “1” (win) or a “0” (loss). Keeping track of information like win margin would be the equivalent of using level 1 and 2 events, because then we have a count of wins that were close to being losses, but still, this is limited data. What makes a win count as close to being a loss, or vice versa? This is why good threshold setting takes time, and will likely spark a few debates. If we wanted to analyse our performance we, along with the flight data monitoring system we used, will need to take many more, smaller measurements.

L3Harris’ FDM platform doesn’t just trigger events, it takes hundreds of measurements per flight. Some of these measurements are simple ‘yes’ or ‘no’ results, such as whether or not the fuel crossfeed valve was open at lift-off, and some are more complex, such as from what height the approach met the stability criteria. These measures that are not just a “1” or “0” lend themselves to trending, and trending with some of these measures can highlight a potential future problem before it arises.

This type of trending already takes place in the maintenance function of a FDM program, tracking the maximum gas temperatures that an individual engine is reaching at given density altitudes and fuel flows, and extrapolating forwards is not too difficult. Predicting whether your operation as a whole, or an individual member of flight crew, is like to have a particular incident in the future is more difficult, and takes an advancement in KPV-based statistics.

So, how do we get to the point where we can accurately use data trends to predict a certain incident?

Well, we start with transfer functions and I look forward to sharing more info on this shortly

Click here to read Andy’s previous article.