Two years ago the framing inside most regulated institutions was: where can we deploy a chatbot. Now it is: how do we govern the LLM estate that already exists, sprawled across business units, often without the model-risk function knowing about most of it.
The shift in framing
Generative AI is not a new kind of software. It is a new kind of model — one with non-deterministic outputs, opaque internals, training-data lineage that is rarely controlled by the deploying institution, and a drift profile that legacy validation harnesses were not designed for. The institutions that recognised this early treat every LLM deployment as inventory-eligible, governance-eligible and validation-eligible from day one.
What changes operationally
- Model inventory expands. Every prompt template, retrieval pipeline and agent workflow becomes a model record with an owner, a purpose statement and a tier.
- Lineage becomes input-and-output, not just code. What was in the context window? What did the model produce? What did the human downstream actually do with it? All auditable, all retrievable.
- Evaluation is continuous, not at release. Accuracy, hallucination rate, bias proxies and refusal behaviour all monitored on live traffic with documented thresholds.
- Challenger review is structural. A second model, a second team or a documented human-in-the-loop sample — defined upfront and operating in production.
What good looks like
An LLM-powered workflow in a regulated bank or insurer is indistinguishable, from a governance-artefact perspective, from a classical risk model: documented purpose, documented limitations, validation evidence, monitoring, ownership and lifecycle. The fact that the model writes English rather than producing a number is incidental.
The institutions that internalised this in 2024 are now shipping production use cases at meaningful volume. The institutions that deferred are spending 2026 retroactively inventorying their estate before the next supervisory conversation.