The Proficiency Ceiling: Why Assessment Fails the Expert

The subtle, yet profound, ways standardized tests can obscure true mastery.

The examiner’s pen hovered for an agonizing 17 seconds, a plastic tip vibrating against a sheet of carbonless copy paper. Jean-Pierre Dubois watched the ink bleed slightly, a tiny blue blotch forming on the line designated for ‘Operational Level 5.’ Outside the window of the testing center, a light drizzle blurred the edges of the taxiway, mirroring the internal fog that settled over Jean-Pierre’s pride. He was a man who had spent 37 years navigating the complex linguistic dance of international airspace, a graduate of Harvard who spoke three languages with the fluid grace of a concert pianist, yet here he was, being labeled as ‘extended’ rather than ‘expert.’ It wasn’t that he had failed a question. It was that the test itself had run out of room to measure him.

Operational Level

I’ve always felt a strange, prickly discomfort with the way we categorize human capability. It’s like trying to measure the depth of the Atlantic with a 47-foot piece of string; eventually, you just run out of string and assume everything below that point is simply ‘deep.’ We do this in aviation, in medicine, and in software engineering. We build frameworks to catch the incompetent, but in doing so, we inadvertently create a ceiling that smothers the exceptional. I spent 237 minutes last night falling into a Wikipedia rabbit hole about the history of psychometrics, starting with the Binet-Simon scale and ending up at ‘Goodhart’s Law’-the idea that when a measure becomes a target, it ceases to be a good measure.

Jean-Pierre represents the ultimate victim of this law. His proficiency isn’t just a collection of vocabulary words or a lack of an ‘interfering’ accent. It is an intuitive grasp of nuance. When he speaks to a controller in a high-stress environment, he isn’t just transmitting data; he is managing the emotional state of the frequency. But the rubric in front of the examiner doesn’t have a box for ’emotional management’ or ‘sophisticated use of irony to diffuse tension.’ It has boxes for ‘structure,’ ‘fluency,’ and ‘comprehension.’ The examiner, a man who likely hasn’t read a book in his second language in 17 years, looks at the rubric and looks at Jean-Pierre. He hears a level of sophistication he cannot categorize. Because he cannot categorize it, he defaults to the highest level he feels ‘safe’ verifying.

Examiner’s Gauge

Level 5

Measured Proficiency

Jean-Pierre’s Reality

Expert

Actual Mastery

Luna S.-J., an assembly line optimizer I worked with during a brief, chaotic stint in manufacturing logistics, used to call this the ‘Resolution Error.’ She had this sharp, 7-point checklist for identifying bottlenecks in human systems. ‘If your gauge only goes to 100,’ she told me while we watched a robotic arm fail to calibrate to a 107-degree variance, ‘then 101 and 1000 look exactly the same to the machine.’ Luna was obsessed with precision. She would spend $777 on a specialized sensor just to measure a fraction of a millimeter because she knew that the ‘close enough’ mentality was where the real systemic rot began.

In the world of aviation language, this rot manifests as the Level 5/Level 6 divide. Level 6 is supposed to be the ‘Expert’ level-a lifetime certification that says you have mastered the language. Yet, many highly proficient, native-equivalent speakers find themselves stuck at Level 5 because the assessment design is inherently reductive. The examiners are trained to look for mistakes. If you don’t make mistakes, they don’t know what to do with you. It’s a paradox: the more perfect your English, the less ‘data’ you provide for an examiner trained to detect errors.

I once made the mistake-and I’m admitting this here because vulnerability is supposedly good for the soul-of thinking that Level 6 was just about ‘sounding like a native.’ I was wrong. I spent hours reading the ICAO Doc 9835, which is about as exciting as watching paint dry on a 47-degree day, only to realize that Level 6 is about the ability to handle the ‘unusual and unexpected.’ But how does a 20-minute interview in a quiet room simulate the unexpected? It can’t. So the test reverts to its lowest common denominator. It checks if you know the word for ‘landing gear’ or if you can conjugate ‘to fly’ in the past participle.

The measurement ceiling isn’t just a bureaucratic hurdle; it’s a failure to recognize the artistry of professional communication.

When we look at the requirements for Level 6 Aviation, we are looking at a standard that demands a specific kind of demonstration. It’s not just about what you say, but the effortless nature of how you say it. For Jean-Pierre, the frustration wasn’t about the need to re-test in six years. It was the realization that he was being judged by a person who didn’t possess the tools to see him. It’s like a colorblind person being asked to grade a sunset. They can tell you it’s bright, but they can’t tell you it’s crimson.

Luna S.-J. would argue that the entire system needs a feedback loop that accounts for the ‘upper tail’ of the distribution. In her assembly lines, she didn’t just look for defects; she looked for ‘excessive quality’ that could be leveraged. If a part was stronger than the specification required, she didn’t just throw it in the ‘pass’ bin; she marked it for use in high-stress components. We don’t do that with people. We just say, ‘You passed the minimum,’ and we move on.

I’ve been thinking a lot about the word ‘fluency.’ It comes from the Latin ‘fluere,’ to flow. But testing isn’t a flow; it’s a series of dams. Each question is a barrier meant to hold back the unqualified. But for someone like Jean-Pierre, the water is so deep that the dams are completely submerged. The examiner sees a flat surface and assumes the water is only a few inches deep. This is the ‘Expertise Blindness’ of the assessor. They are so focused on the floor that they forget to look for the sky.

There is a specific kind of exhaustion that comes with being a high-performer in a low-resolution world. It’s the feeling of having to simplify your vocabulary so you don’t confuse the person who is supposed to be evaluating your vocabulary. I’ve caught myself doing it in technical meetings, using ‘good’ instead of ‘efficacious’ because I don’t want to spend 17 minutes explaining what I mean if the other person hasn’t had their coffee yet. We shrink ourselves to fit the boxes provided.

But Jean-Pierre didn’t shrink.

During the ’emergency simulation’ portion of his oral exam, he described a bird strike not just as a mechanical failure, but as a ‘stochastic event that challenged the structural integrity of the cowling while simultaneously demanding an immediate recalibration of the crew’s cognitive load.’

“Stochastic event…”

Describing excellence

The examiner blinked. He wrote down ‘Vocabulary: Good.’

This is where we are. We have 147 different ways to describe a failure, but only three ways to describe excellence. We focus so heavily on the 7% of people who might fail that we ignore the 17% who are operating at a level that could redefine the standard. If we continue to use assessment frameworks that only discriminate at the bottom, we will eventually lose the ability to recognize what the top even looks like.

I realize I’m being a bit cynical here. Maybe it’s the lack of sleep from that Wikipedia dive-did you know that the first standardized tests in China were used to select government officials as early as the Sui Dynasty? They were obsessed with calligraphy and poetry. At least back then, they were looking for something transcendent, even if it was arbitrary. Now, we look for ‘conformity to descriptors.’

We need to stop treating English proficiency as a binary ‘can or cannot.’ It is a spectrum that extends far beyond the ‘Expert’ label. When we tell a pilot like Jean-Pierre that he is a ‘5,’ we aren’t just giving him a grade; we are telling him that his 37 years of nuance don’t matter to the spreadsheet. We are telling him that the resolution of our system is too low to capture his frequency.

Luna S.-J. once told me that the only way to truly optimize a system is to listen to the outliers. ‘The people at the edges,’ she said, ‘are the ones who tell you where the system is going to break next.’ The system of language assessment in aviation-and everywhere else-is breaking at the top. It is failing to incentivize the very excellence it claims to require.

Jean-Pierre eventually took his papers, thanked the examiner with a polite, measured 7-word sentence, and walked out into the rain. He didn’t argue. He didn’t point out the absurdity of the score. He just realized that the ‘5’ on the paper was a reflection of the test’s limitations, not his own. He climbed into his car, the engine humming at a steady 947 RPM, and drove away, a trilingual master of the sky, labeled ‘Operational’ by a man who couldn’t see the crimson in the sunset.

Why do we settle for ‘good enough’ when ‘extraordinary’ is sitting right in front of us?

Perhaps it’s because ‘extraordinary’ is too hard to put into a cell on an Excel sheet. We prefer the safety of the ceiling because we are afraid of how high the sky actually goes.