AI Translation Tools in Frontline Service
How an AI translation tool is being absorbed into live gate work —
and what determines whether it gets used when it matters.
Role
Research lead, Customer Strategy & Innovation
Designed and led mixed-methods field evaluation
Partners
Airport Operations
Gates team
Employee Experience
Scope
Live operational adoption and scale-up of AI at the gate
First live-operations AI deployment at the gate
Summary
Problem
Design
Insight
Impact
67M
US residents speak a language other than English at home.
91%
of customer service leaders feel
pressure to implement AI.
88% → 25%
of contact centers use AI, but only 25% have fully integrated it into daily work.
Sources: U.S. Census Bureau (ACS); Zendesk CX Trends; industry CX research, 2025.
Summary
Gate agents routinely support passengers whose English proficiency varies, often in moments where timing, documentation, and boarding direction are operationally consequential. That support has historically depended on informal workarounds — multilingual coworkers, personal translation apps, gestures, repeated explanation, escalation — producing inconsistent experience and avoidable burden on agents during peak operations. I led the field evaluation of an AI-based translation tool piloted at the gate to understand whether it could support faster, clearer, safer interactions between agents and non-English-speaking passengers — the first deployment of AI by airport customer-facing employees at United. The research showed that AI translation is most valuable when it reduces short, high-frequency clarification work without forcing agents to abandon operational flow, and that adoption depends less on enthusiasm for AI than on access speed, mode fit, and clear boundaries for when the tool is safe enough to use. The work reframed the question from whether AI translation was accurate enough to whether it could fit into the gate workflow cleanly enough to be used in the moments that matter.
Problem
Frontline service work has typically assumed a shared operating language: the agent speaks the airline's language of operations, the passenger receives information in that language, and translation — when it's needed at all — is handled informally on the side. That assumption is becoming less stable. International gate operations now concentrate several forms of pressure into a small window: boarding timing, documentation checks, wayfinding, seat and group questions, last-minute rebooking concerns, and passenger anxiety — much of it directed at agents who serve passengers across dozens of language backgrounds in a single shift.
Language support in this environment has historically been inconsistent. Agents rely on whoever may speak the passenger's language, on personal translation apps, on gestures, on paper documents, on repeated explanation. These methods can work in isolated cases, but they're not reliable as an operational system. They vary by flight, shift, agent tenure, language, crowd density, and the availability of informal help.
This raised a set of strategic questions:
Where does AI translation actually fit in the rhythm of live gate operations?
What conditions need to hold for agents to reach for it under boarding-time pressure rather than defaulting to informal workarounds?
How do its two modes — preset phrases and free conversation — earn different kinds of trust?
What makes a translation safe enough to use when the agent remains accountable for the customer interaction?
The broader issue wasn't whether AI translation is capable. It was whether frontline service work is being reorganized around a new kind of operational support — one that has to fit into the seconds-level rhythm of the gate before its capability matters at all.
Design
The research was conducted as a two-part field evaluation in a live airport gate environment, beginning with a soft-launch briefing and post-demo discussion, followed by live gate evaluation during the pilot. The study focused on gate agents working international and domestic flights, with attention to high-diversity routes, peak boarding windows, documentation-related conversations, and moments where agents would otherwise have relied on informal translation support.
The study combined five inputs: post-demo interviews with gate agents (n=10) immediately following the soft-launch briefing; contextual observations during live gate operations (n=8 sessions) capturing 47 translation-relevant interactions; post-use micro-feedback captures (n=52) collected when operationally feasible; semi-structured interviews with agents and leads (n=15); and an all-agent pulse survey (n=32 responses).
The design choice that mattered most was structuring the study to capture both anticipated and actual adoption. The post-demo phase established what agents thought the tool would do for them — which workflows looked promising, which modes seemed usable, where they expected breakdowns. The live observation phase tested those expectations against real conditions. Adoption questions in frontline operations are usually answered too early, before the gap between demo behavior and live behavior becomes visible. Pairing the two phases meant the research could surface that gap directly — and explain it.
The analysis focused on four recurring dimensions: where AI translation fit cleanly into gate work and where it didn't, how agents weighed setup friction against operational pace, how the two modes supported different kinds of trust, and where the boundary sat between safe to use and needs escalation. The objective was operational pattern finding — identifying the conditions under which AI translation becomes a practical part of frontline service rather than another system agents have to manage.
The vision animating the work was simple: an AI tool at the gate should disappear into workflow. It should accelerate the bounded moments that already define service — and leave the judgment-heavy ones where they belong, with the agent.
Insight
Three patterns emerged, each informing where AI translation belongs in frontline gate work.
AI translation earned its place in short, bounded gate moments — where it functioned as operational clarification, not customer interpretation.
Adoption was governed by access speed and momentum preservation, not by translation quality alone — friction tolerance collapses faster than quality scores reveal.
Trust was built through mode-fit and operational boundaries, not accuracy — agents reach for AI when they know where it stops being safe.
1. AI Translation Fit Best as Operational Clarification, Not Customer Interpretation
The clearest fit was short, high-frequency gate work: confirming a gate location, explaining boarding-group timing, directing a passenger to a document check, clarifying where to stand, repeating a procedural instruction in the passenger's preferred language. These interactions were already bounded. Agents knew what needed to be communicated, and the passenger's need could usually be resolved in one or two turns.
In these cases, AI translation reduced the interpretive work agents normally carry. Rather than improvising with gestures, asking another employee for help, or typing into a personal app, agents could communicate a clear operational message and check whether the passenger understood.
The weaker fit appeared in more complex or sensitive interactions. Documentation conversations, policy explanations, irregular operations, and emotionally charged customer issues required more than translation — they required judgment, interpretation of airline policy, escalation awareness, and sensitivity to what the passenger might infer from the message. In those moments, agents were more cautious, and rightly so.
The implication wasn't that AI translation should replace multilingual service support. It was that the tool's strongest value is as a clarification layer for recurring, operationally bounded moments where speed and consistency matter most.
2. Adoption Depended on Preserving Gate Momentum
Agents didn't evaluate the tool in isolation. They evaluated it against the pace of gate work. During slower periods, agents were willing to experiment — open the tool, select a language, try the conversational mode, assess the result. During active boarding or peak crowding, the threshold changed. A small amount of friction — opening the app, finding the right phrase, confirming the language, waiting for output, adjusting volume — could be enough to push agents back toward gestures, quick English repetition, or informal help.
The pattern was consistent: the more operational pressure increased, the less tolerance agents had for setup work. The tool had to be available at the moment of need, not merely available somewhere in the ecosystem.
This matters because gate operations aren't evenly paced. A tool can appear usable in training and still fail in the moments where it could create the most value. In high-pressure conditions, agents don't want another system to manage. They want the interaction to move forward.
Scale readiness depends heavily on workflow integration. Persistent access, faster launch points, default language shortcuts for common routes, easier mode selection, clearer audio and text delivery — these matter as much as translation quality itself. Usability isn't a property of the interface alone. It's a property of the interface under boarding pressure.
3. Trust Was Shaped by Mode-Fit and Operational Boundaries, Not Accuracy Alone
Agents treated the two modes as meaningfully different tools. Quick Help — preset phrases — was understood as faster, safer, and better suited to high-pressure moments because the content was already bounded. Preset phrases reduced the risk that the agent would say something imprecise, and they fit recurring questions agents already knew how to answer. Conversational mode carried a different value — more flexible, better suited to passenger-specific questions, but requiring agents to place more trust in speech recognition, translation accuracy, tone, and the tool's ability to handle multi-turn exchanges.
But beyond mode-fit, agents weren't only asking whether the translation was right. They were asking whether it was safe to use in a specific operational moment. A translation could be mostly accurate and still feel risky if it sounded too formal, omitted context, mishandled airline terminology, or left room for misunderstanding around documentation, boarding eligibility, or next steps.
This matters because gate agents remain accountable for the passenger interaction even when AI produces the translation. If a customer misunderstands boarding timing, document requirements, or where to go next, the operational consequences still return to the agent and the airline. Accuracy scores are necessary but insufficient. The more important question is whether the tool helps agents produce messages that are clear, contextually appropriate, and safe within the constraints of gate work — and whether agents know exactly where the tool stops being the right answer.
Trust, in other words, isn't a property of the translation. It's a property of the agent's relationship with the tool: a clear sense of which mode fits which moment, where the tool's safety boundary sits, and what to do when it's crossed. Training therefore becomes a central part of the product experience — not a one-time demo, but ongoing guidance on where the tool is appropriate, how to verify passenger understanding, and what to do when the translation feels wrong.
Impact
The work led to four interlocking outcomes:
A first deployment of AI by frontline employees at United. This was the first time an airport customer-facing team adopted AI for live customer service — the employee-experience counterpart to the AI travel-planning research, together establishing the first behavioral foundation for thinking about AI adoption on both the customer and the frontline sides of the airline.
Live operational adoption at the gate. The tool is now used by gate agents in live operations, applied in the bounded, high-frequency moments the research identified as the strongest fit — clarification work that previously fell back on gestures, informal help, or unsanctioned personal apps.
A framework for evaluating frontline AI integrations. The dimensions surfaced by the research — operational fit, momentum preservation, mode-specific trust, and boundary-defined safety — are now being used to evaluate further AI integrations across airport customer-service workflows. They clarify what scale readiness actually requires (faster access, real-gate phrase libraries, clear use and escalation boundaries, audio and text delivery designed for noise) and what training has to address before AI tools can move from pilot to standard practice.
A reference point at senior levels. Presented to senior leadership across Airport Operations and the Gates team, with broader Customer Experience engagement, the research has shifted how the organization frames frontline AI adoption — not as a capability question, but as a workflow-integration question.
The broader takeaway, beyond airports: as AI tools enter live service environments, adoption is governed by the same constraints that shape every other operational tool — timing, trust, clarity, and fit within physical workflow. Agents will use AI when it helps them preserve forward motion. They'll avoid it when it adds interpretation work, slows the interaction, or leaves them uncertain about whether the output is safe to stand behind. The strategic opportunity isn't to make agents use AI. It's to design tools that resolve common needs more consistently — while keeping the judgment and fallback options frontline work depends on.