- The Fenced Forest -

← Back to home

div

Shipping to Learn, Not to Impress: Deploying GenAI Beta Features Responsibly in a High-Trust Environment

5 mins read

div
div

In 2024–2025, I worked with business units in a bank to roll out beta GenAI features into live, high-trust workflows. Stakeholders were raring to build but we were wary of releasing beta features until we understood what we were actually building and for whom.

I summarised the flow of project events. Pushing back on the feature list, going upstream before going forward, and making a collective call to ship something imperfect into a live environment where first impressions of GenAI tend to stick.

The behavioural side of the same experience, why users hesitated, what trust actually demands, and how to scaffold for it, is covered separately in What It Takes to Trust a Probabilistic AI in Deep Enterprise. That piece goes deeper, while this one stays closer to project execution realities.

div

Questioning the ‘ask’

The initial feature list looked reasonable on paper. However, before committing, we wanted to understand the "why" behind it. Conversations with product owners revealed that the features were scoped primarily around technical constraints rather than user needs:

  • Training overhead that shaped what was practical to build within limited timelines.
  • An ongoing datalake migration that left the available data less than clean.

Having spent three years speaking to their users, we had enough doubt to push back, and enough context to do it constructively, so we pulled the team back to discuss the feature list.

div

Slowing down before speeding up

This isn't an unusual situation in enterprise product development. Backlogs get shaped by what's technically feasible, what's easy to explain to stakeholders, and what avoids touching systems already under stress. User needs, when they appear at all, tend to arrive as a layer applied after the architecture has already been decided.

What we proposed instead was to go upstream before going forward. Once we identified staff through a recruitment criteria, we ran a combination of:

  • Discovery interviews to understand what users were actually trying to do.
  • Co-creation sessions to involve them in shaping what a GenAI-assisted experience might look like in practice.
  • Paper testing before anything was built, to surface friction before it became code.

The goal wasn't to discard business team’s thinking or feature list. It was to stress-test it with the people the features were meant to serve. In a live, regulated environment, getting GenAI perceptions wrong isn't just a poor user experience. It erodes confidence in a capability that's difficult to rebuild once it's lost.

The collective decision to ship imperfect

Before a single feature could proceed to solutioning, there was a conversation often gets skipped in the rush to ship. Getting there meant bringing design, engineering, and business stakeholders into the same room, not to present a plan, but to collectively weigh what it meant to release something unfinished into an environment where users had real stakes in the outcome. That orchestration was deliberate. Decisions of this kind tend to go sideways when they're made in silos and handed down.

A poorly received beta in a high-trust environment carries a particular adoption risk. Users form early opinions, and those opinions tend to stick. A bad first impression of an AI feature can quietly set the ceiling for what full adoption looks like later. What made the conversation productive was being specific about what we were and weren't confident in, what the guardrails would be, and what we were hoping to learn. Responsibility for that was shared across the table, not delegated to one function to carry alone.

div

A hybrid model and what followed

What emerged was a hybrid ship-to-test model, built around two use cases sufficiently scoped to go to beta: image and text recognition, and enterprise information analysis and summarisation.

Getting there required guardrails from both a systems and design perspective:

  • Systems side, features built around data constraints that avoided the migrating datalake, keeping the risk surface manageable.
  • Design side, error states, edge case handling, beta visual indicators and expectation-setting language treated as carefully as the features themselves.

These weren't positioned as finished products, but as beta capabilities in active development, with users invited into the process rather than presented with users invited into the process rather than handed a finished decision. In a banking environment, that framing carries weight. Users tend to be more tolerant of imperfection when they understand the context. They're far less tolerant of feeling like they were the last to know.

Monitoring for continuous study

Post-launch monitoring ran through passive analytics and in-product popup surveys at point of use. It wasn't a sophisticated setup. But it was run consistently and always at the point where the experience was freshest. What that feedback revealed fed directly into book-of-work conversations about how to scale AI responsibly across the portfolio.

The value of a beta isn't just in what you ship. It's in what you learn at a stage when learning is still relatively cheap.

div

What this kind of prudence actually costs

The honest answer is that it costs time upfront, and it requires stakeholders willing to accept that a more considered process tends to produce a better outcome than a faster one. In a regulated environment that trades on trust, that case is usually makeable. What naturally helped us here was having enough history with the users to speak with confidence about what the feature list was and wasn't accounting for.

The broader point, from what I observed throughout this process, is that de-risking AI deployment in enterprise isn't primarily a technical problem. The technical constraints don't disappear. They just tend to be the part of the problem you can actually get your hands around.

div

The harder work is the organisational alignment required to ship something that actually respects the environment it's landing in. Honest about what it is, genuinely involving users, and treating what happens after launch as part of the work rather than a footnote to it.

Beta doesn't mean unfinished and unsupervised. In high-trust environments, it needs to be the opposite. But of course, with efforts balanced against a unanimously accepted risk appetite.

div

← Back to home

div

© 2025–2026 Kevyn Leong

- The Fenced Forest -

← Back to home

div

Shipping to Learn, Not to Impress: Deploying GenAI Beta Features Responsibly in a High-Trust Environment

5 mins read

div
div

In 2024–2025, I worked with business units in a bank to roll out beta GenAI features into live, high-trust workflows. Stakeholders were raring to build but we were wary of releasing beta features until we understood what we were actually building and for whom.

I summarised the flow of project events. Pushing back on the feature list, going upstream before going forward, and making a collective call to ship something imperfect into a live environment where first impressions of GenAI tend to stick.

The behavioural side of the same experience, why users hesitated, what trust actually demands, and how to scaffold for it, is covered separately in What It Takes to Trust a Probabilistic AI in Deep Enterprise. That piece goes deeper, while this one stays closer to project execution realities.

div

Questioning the ‘ask’

The initial feature list looked reasonable on paper. However, before committing, we wanted to understand the "why" behind it. Conversations with product owners revealed that the features were scoped primarily around technical constraints rather than user needs:

  • Training overhead that shaped what was practical to build within limited timelines.
  • An ongoing datalake migration that left the available data less than clean.

Having spent three years speaking to their users, we had enough doubt to push back, and enough context to do it constructively, so we pulled the team back to discuss the feature list.

div

Slowing down before speeding up

This isn't an unusual situation in enterprise product development. Backlogs get shaped by what's technically feasible, what's easy to explain to stakeholders, and what avoids touching systems already under stress. User needs, when they appear at all, tend to arrive as a layer applied after the architecture has already been decided.

What we proposed instead was to go upstream before going forward. Once we identified staff through a recruitment criteria, we ran a combination of:

  • Discovery interviews to understand what users were actually trying to do.
  • Co-creation sessions to involve them in shaping what a GenAI-assisted experience might look like in practice.
  • Paper testing before anything was built, to surface friction before it became code.

The goal wasn't to discard business team’s thinking or feature list. It was to stress-test it with the people the features were meant to serve. In a live, regulated environment, getting GenAI perceptions wrong isn't just a poor user experience. It erodes confidence in a capability that's difficult to rebuild once it's lost.

The collective decision to ship imperfect

Before a single feature could proceed to solutioning, there was a conversation often gets skipped in the rush to ship. Getting there meant bringing design, engineering, and business stakeholders into the same room, not to present a plan, but to collectively weigh what it meant to release something unfinished into an environment where users had real stakes in the outcome. That orchestration was deliberate. Decisions of this kind tend to go sideways when they're made in silos and handed down.

A poorly received beta in a high-trust environment carries a particular adoption risk. Users form early opinions, and those opinions tend to stick. A bad first impression of an AI feature can quietly set the ceiling for what full adoption looks like later. What made the conversation productive was being specific about what we were and weren't confident in, what the guardrails would be, and what we were hoping to learn. Responsibility for that was shared across the table, not delegated to one function to carry alone.

div

A hybrid model and what followed

What emerged was a hybrid ship-to-test model, built around two use cases sufficiently scoped to go to beta: image and text recognition, and enterprise information analysis and summarisation.

Getting there required guardrails from both a systems and design perspective:

  • Systems side, features built around data constraints that avoided the migrating datalake, keeping the risk surface manageable.
  • Design side, error states, edge case handling, beta visual indicators and expectation-setting language treated as carefully as the features themselves.

These weren't positioned as finished products, but as beta capabilities in active development, with users invited into the process rather than presented with users invited into the process rather than handed a finished decision. In a banking environment, that framing carries weight. Users tend to be more tolerant of imperfection when they understand the context. They're far less tolerant of feeling like they were the last to know.

Monitoring for continuous study

Post-launch monitoring ran through passive analytics and in-product popup surveys at point of use. It wasn't a sophisticated setup. But it was run consistently and always at the point where the experience was freshest. What that feedback revealed fed directly into book-of-work conversations about how to scale AI responsibly across the portfolio.

The value of a beta isn't just in what you ship. It's in what you learn at a stage when learning is still relatively cheap.

div

What this kind of prudence actually costs

The honest answer is that it costs time upfront, and it requires stakeholders willing to accept that a more considered process tends to produce a better outcome than a faster one. In a regulated environment that trades on trust, that case is usually makeable. What naturally helped us here was having enough history with the users to speak with confidence about what the feature list was and wasn't accounting for.

The broader point, from what I observed throughout this process, is that de-risking AI deployment in enterprise isn't primarily a technical problem. The technical constraints don't disappear. They just tend to be the part of the problem you can actually get your hands around.

div

The harder work is the organisational alignment required to ship something that actually respects the environment it's landing in. Honest about what it is, genuinely involving users, and treating what happens after launch as part of the work rather than a footnote to it.

Beta doesn't mean unfinished and unsupervised. In high-trust environments, it needs to be the opposite. But of course, with efforts balanced against a unanimously accepted risk appetite.

div

← Back to home

div

© 2025–2026 Kevyn Leong